AI has yet to learn the relevant nuances of what we call

GPT-3 is the result of many years of work at the world’s leading AI labs, including OpenAI, an independent organization backed by $1 billion in funding from Microsoft as well as Google and Meta labs. GPT-3 is what AI scientists call a neural network, a mathematical system loosely modeled on a web of neurons in the brain. As might be expected, there is more than one mathematical model on which these are built.

At Google, a system called BERT (short for Bidirectional Encoder Representation from Transformers) was also trained on a large selection of words online. It can guess the missing words in any part of millions of sentences. OpenAI is also refining another system called DALL-E, whose main function is to use text captions to automatically generate images for a wide range of captions expressed in natural language in the form of neural networks.

Such large-scale neural networks are now called foundation models, as they form the basis for all kinds of AI applications that can be written. They differ from other cognitive models that use smaller data sets to train AI systems, as their origins involve scouring almost every piece of information available on the web, a data store that increases in size every two years. gets doubled. On one hand, the website Live-counter.com, which attempts to track the size of the Internet, said it stood at around 40 zettabytes in 2020. According to the two-year doubling rule, that number should now be closer to 80 zettabytes. , For reference, one zettabyte is equal to one trillion gigabytes.

Foundation models have since become popular as they take on traditional methods of training AI programs with small data sets. He was expected to be a game changer. However, researchers are now finding that they have several limitations. A paper contributed by over 100 researchers, available at https://arxiv.org/abs/2108.07258 from Cornell University, has this to say: “AI is undergoing a paradigm shift with the rise of the model (eg. For example, BERT, DALL-E) , GPT-3) which are trained on massively extensive data and are adapted to a wide range of downstream tasks. We call these models foundation models for their critically central yet This report provides a complete description of the opportunities and risks of foundation models, including their capabilities (eg, language, vision, robotics, logic, human interaction) and technical principles (eg, model architecture). , training processes, data, systems, security, evaluation, theory) their applications (eg, law, health care, education) and social impact (eg, inequality, abuse, economic and environmental impact, legal and ethical considerations) for.”

“Although foundation models are based on standard deep learning and transfer learning, their scale results in new emerging capabilities, and their effectiveness in so many tasks encourages homogeneity. Homogenization provides powerful leverage, but demands caution. because foundation model defects are inherited by all optimized models downstream. Despite the imminent widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail , and what they are capable of doing because of their emergent properties. To tackle these questions, we believe that much of the important research on foundation models needs to be done in deep interdisciplinary collaboration in line with their fundamental socio-technical nature. will be required.”

Simply put, foundation models are far from ready for prime time. His Achilles heel is that his understanding of context is inadequate. For example, a simple query made by one person to another, such as “How’s your wife?”, will list possible responses based on time, circumstances, context, and the strength of one’s relationship (with both of their wives). can get a large number. and with your interlocutor).

Even in my own example, I can think of a large number of different answers such as “she’s fine” or “she’s gardening” or “she’s exaggerating” and so on, who Asking when, where and under what circumstances. , A human mind will add all kinds of information to the answer to that question, including tongue-in-cheek humor, instincts or deliberate objection.

While GPT-3 and others have taken several steps regarding context, and can independently churn out prose that includes many contextual nuances, they are unable to cover context such as urgency in tone, or in dialogue. Sensitivity, such as often taking part in medical discussions and even anything as redundant as a question from your mother-in-law about your wife.

To be fair, these models already have something called ‘late binding context’; For example, his response could be linked to the latest version of some database – eg, Mint’s coverage of India’s performance in the Commonwealth Games. But that too is relying only on an anchor database with the latest information, not a full appreciation of all relevant information.

To be ready for prime-time, the foundation model will need to make significant progress incorporating the ability to dynamically understand and apply many aspects of the late binding context. For a truly interactive AI, complete command of context is paramount.

Siddharth Pai is the co-founder of Sienna Capital and the author of Takeproof Me.

catch all business News, market news, today’s fresh news events and breaking news Updates on Live Mint. download mint news app To get daily market updates.

More
low

subscribe to mint newspaper

, Enter a valid email

, Thank you for subscribing to our newsletter!