PaLM 2 Unpacked: Google's Leap Forward in AI Linguistics

PaLM 2 Unpacked: Google’s Leap Forward in AI Linguistics

While OpenAI is leading the way for generative AI development, many have accused Google of lagging behind. However, not to be outdone, Google launched a new large language model, PaLM 2, at its 2023 Google I/O conference.Set to come in four different sizes for a range of applications, Google’s new LLM is apparently already powering several Google services, with much more to come.

MUO VIDEO OF THE DAY

SCROLL TO CONTINUE WITH CONTENT

Disclaimer: This post includes affiliate links

If you click on a link and make a purchase, I may receive a commission at no extra cost to you.

What Is PaLM 2?

At Google I/O 2023, held on May 10, Google CEO Sunda Pichai revealed Google’s latest plaything: PaLM 2 .

Short for Pathways Language Model 2, Google’s upgraded LLM is the second iteration of PaLM, with the first version launching back in April 2022. Can’t remember PaLM? Well, at the time, it was big news and received heaps of interest for its ability to converse a little, tell basic jokes, and so on. Fast forward six months, and OpenAI’s GPT-3.5 blew everything out of the water , including PaLM.

Since then, OpenAI launched GPT-4 , a massive upgrade on GPT-3.5. Yet while the newer model is being integrated into numerous tools, most notably Microsoft’s Bing AI Chat, Google is taking aim at OpenAI and GPT-4 with PaLM 2 and will hope its upgraded LLM can close what appeared to be a significant gap—the Google Bard launch was hardly a roaring success.

Pichai announced that PaLM 2 will come in four different model sizes: Gecko, Otter, Bison, and Unicorn.

Gecko is so lightweight that it can work on mobile devices and is fast enough for great interactive applications on-device, even when offline. This versatility means PaLM 2 can be fine-tuned to support entire classes of products in more ways, to help more people.

With Gecko able to process around 20 tokens per second—tokens are the values assigned to real words for use by generative AI models—it looks likely to be a game-changer for mobile deployable AI tools.

PaLM 2 Training Data

Google wasn’t exactly forthcoming with PaLM 2’s training data, understandable given it was just released. But Google’s PaLM 2 Report [PDF] did say that it wanted PaLM 2 to have a deeper understanding of mathematics, logic, and science, and that a large part of its training corpus focused on these topics.

Still, it’s worth noting that PaLM was no slouch. When Google revealed PaLM, it confirmed that it was trained on 540 billion parameters, which at the time was a colossal figure.

OpenAI’s GPT-4 is alleged to use over one trillion parameters, with some speculation putting that figure as high as 1.7 trillion. It’s a safe bet that as Google wants PaLM 2 to compete directly with OpenAI’s LLMs, it’ll feature, at the very least, a comparable figure, if not more.

Another significant boost to PaLM 2 is its language training data. Google has trained PaLM 2 in over 100 languages to give it greater depth and contextual understanding and increase its translation capabilities.

But it’s not just spoken languages. Linking to Google’s demand for PaLM 2 to deliver better scientific and mathematical reasoning, the LLM has also been trained in more than 20 programming languages, which makes it a phenomenal asset for programmers.

PaLM 2 Is Already Powering Google Services—But Still Requires Fine Tuning

It won’t be long until we can get our hands on PaLM 2 and see what it can do. With any luck, the launch of any PaLM 2 applications and services will be better than Bard.

But you may have (technically!) used PaLM 2 already. Google confirmed PaLM 2 is already deployed and in use across 25 of its products, including Android, YouTube, Gmail, Google Docs, Google Slides, Google Sheets, and more.

But the PaLM 2 report also reveals that there is still work to be done, specifically towards toxic responses across a range of languages.

For example, when specifically given toxic prompts, PaLM 2 generates toxic responses more than 30 percent of the time. Furthermore, in specific languages—English, German, and Portuguese—PaLM 2 delivered toxic responses more than 17 percent of the time, with prompts including racial identities and religions pushing that figure higher.

No matter how much researchers attempt to cleanse LLM training data, it’s inevitable some will slip through. The next phase is to continue training PaLM 2 to reduce those toxic responses.

It’s a Boom Period for Large Language Models

OpenAI wasn’t the first to launch a large language model, but its GPT-3, GPT-3.5, and GPT-4 models undoubtedly lit the blue touchpaper on generative AI.

Google’s PaLM 2 has some issues to iron out, but that it is already in use in several Google services shows the confidence the company has in its latest LLM.