Google introduces faster AI model ‘Gemini 1.5 Flash’

CALIFORNIA, UNITED STATES — Google unveiled its latest AI models at its annual developer conference, Google I/O. The highlight of the event was the introduction of Gemini 1.5 Flash, the newest addition to Google’s Gemini series.
This model is touted as Google’s lightest and most efficient AI, capable of quickly summarizing conversations, captioning images and videos, and extracting data from large documents and tables.
Here's a full recap of our news and updates from #GoogleIO — in under 10 minutes 🎉 pic.twitter.com/O2B8QPsNTg
— Google (@Google) May 15, 2024
Capabilities of Gemini 1.5 Flash
Gemini 1.5 Flash is geared towards “narrow, high-frequency, low-latency tasks,” making it an ideal choice for applications like real-time customer service responses or rapid image generation. However, it is not yet available for consumer use; instead, developers can access it through Google AI Studio.
Like its counterparts, Gemini 1.5 Flash and Gemini 1.5 Pro are multimodal, capable of processing text, images, and videos. This versatility allows developers to create applications that can handle a wide range of inputs and deliver more engaging and immersive experiences.
“Multimodality radically expands the questions we can ask, and the answers we’ll get back. Long context takes this a step further, enabling us to bring in even more information: hundreds of pages of text, hours of audio or an hour of video, entire code repos… or, if you want, roughly 96 Cheesecake Factory menus,” explained Sundar Pichai, CEO of Alphabet.
Both models boast an impressive context window of up to 1 million tokens, surpassing the 128,000 token limit of GPT-4.
Keeping pace with OpenAI
Google’s move to unveil its latest AI models comes just a day after OpenAI launched its own GPT-4o.
OpenAI’s GPT-4o is touted as being twice as fast as GPT-4 Turbo and half the cost. It also supports 50 different languages and is available through OpenAI’s application programming interface (API) for developers.
In comparison, Google’s Gemini 1.5 Pro supports 35 languages and boasts a 2 million token window, which measures how much information the model can process at once.
“It offers the longest context window of any foundational model yet,” said Pichai. He illustrated this with an example of a parent asking Gemini to summarize all recent emails from their child’s school, demonstrating the model’s practical utility.