What are language models (+ examples)

The term can sound intimidating, but language models are probably something you already take advantage of every day without even noticing. Just think of the words that your phone suggests you on top of the keyword when you’re sending a text. These are a language model’s best guesses for what you might want to type next, based on lots of different kinds of written texts it knows.

According to the experts Jurafsky and Martin’s definition:

“A language model is a probability distribution over sequences of words and word sequences.”

More simply said, a language model is a way of knowing which words are more likely to be used together in a sentence. These predictions are made based on the model’s training. So, first, the model will feed on text corpora in one or many languages. Types of texts that might be included in these collections can come from website pages, social media, articles, and virtually any other source deemed useful. The model, then, figures out which words are more likely to be used together based on how often they appear together in the corpora.

From this explanation, it should have become clear that language models are entirely based on information they are fed with. The language model itself doesn’t understand what you are writing about. It’s merely playing a probability game with words.

‍

Where are language models at in 2022

So far, we have discussed one of the most simple examples of language models, predictive texts. The reason why language models have been attracting so much attention lately and why companies are investing so much into them lies in the possibilities that more complex language models, like GPT-3, open up.

These language models have started to garner interest ever since Google first came out with the transformer architecture, in 2017, revolutionizing the way neural networks work to understand natural language, i.e. what humans say and write.

All you really need to understand about transformer architectures is that it opened up the doors for language models that are more creative than the ones that simply predict the next word you’re going to type in a text bar. Language models that, given the right prompts, can actually generate text on their own, like GPT-3. As a matter of fact, the acronym GPT stands for “Generative Pre-trained Transformer”.

But let’s go back for a second to what happened after 2017. As the world started to understand what transformer architecture could allow, more and more heterogeneous players decided to jump into the space. In the last few years, we’ve seen big companies, like Google, Meta, and Microsoft, as well as startups, like DeepMind and OpenAI, and nonprofits entering the space.

OpenAI, an AI research startup, initially funded by, among others, Elon Musk and Sam Altman, managed to assert itself as one of the most crucial runners in the race. Coming out with GPT-3 in 2020, which by the mere size of its model outran its predecessors GPT-2 and BERT by 100 times, definitely marked a crucial moment in AI history. GPT-3, containing 175 billion parameters, created new and higher expectations for where this field could lead.

Naturally, the other players soon started to follow with their own new and improved models, like Google’s LaMBDA, Google’s PaLM, DeepMind’s Chinchilla, and Microsoft and Nvidia’s MT-MLG. All these models, along with many others that are constantly popping up, are considered “foundation models”.

So, how are these foundational language models used? Their goal is to function as a basis that other individuals or businesses can build on for their own intents and purposes. Many companies and startups are already selling their own solutions, based on language models like GPT-3, to support activities, like writing, translating, sales intelligence, contact support, content moderation, healthcare, and much more.

Is GPT-3 the best performing language model? PaLM, MT-MLG & Chinchilla have lined up

As it has been said, GPT-3 completely shed a new light on language models when it came out in 2020. After 2 years, which are geologic eras in AI time, the potential of GPT-3 is still widely admired.

But, is GPT-3 truly the most powerful language model in 2022?

Many other companies have been working and have released their own updated language models in this time. Let’s consider some of the most popular ones that have come out in the last two years:

Google’s LaMBDA (137 B parameters). You might remember this one for the story of the engineer who was eventually laid off for claiming he had proof the AI was sentient. (Spoiler: It wasn’t.)
Google’s PaLM (540 B). The biggest dense language model ever created thus far, reaching the highest levels across all benchmarks.
Meta’s 3rd version of BlenderBot (175 B). Let’s just say that as soon as this bot was released beginning of August 2022 in the US to allow Facebook users to chat with it, comments of how substandard it is have inundated the web.
Microsoft, who’s now also OpenAI’s main backer, and Nvidia’s MT-MLG (530 B). This is often compared to PaLM and also seems really promising.
DeepMind’s Chinchilla (70 B). Its minute size, compared to the previous ones, should not throw you off here. Chinchilla has outperformed most of the other huge models, throwing out the “bigger = better” rule that most players had lived by, until this moment.

As we can see from this brief list, not all of the new models that have come out have surpassed GPT-3, at least in terms of size.

The two that have, PaLM, with 540 billion parameters, and MT-MLG, with 530 billion, have admittedly made great efforts to enlarge their network and have had better performance results because of that. Right now, many consider PaLM to be the current state of the art.

Surprisingly, the 70-billion-parameter Chinchilla has also revealed itself a powerful contender. This has prompted all players in the space to consider everything they’ve thought thus far about implementing better models, as Chinchilla has shown that better performance is not merely related to parameter number and that there are other factors to consider in the development.

To answer the question, GPT-3 is not the absolute best performing language model out there in 2022, as MT-MLG, Chinchilla, and PaLM have outperformed it. On the other hand, it is also true that some of the more recent models, like Meta’s BlenderBot, that came out in August 2022, are still less performing than GPT-3.

This should also not be taken to mean that OpenAI has been outran by Google and the other players for good. OpenAI has reportedly been working on a new GPT version. We don’t know much about what GPT-4 will be like yet, but Sam Altman has reported to have stated that they were not so focused on its size and, rather, were determined to work on other factors, like optimality and alignment.

‍

Still hungry for knowledge? Follow us on LinkedIn for weekly updates on the world of conversational AI, or check out our latest article where we talk all things GPT-3.