
Google has recently unveiled ‘Gemini,’ their latest state-of-the-art AI model, touted as the most advanced and versatile in its class. This model exhibits proficiency in comprehending text, images, audio, video, and more. Described by Google as a multimodal model, Gemini excels in deciphering and accomplishing intricate tasks across domains such as math, physics, and beyond. Furthermore, it showcases the ability to understand and generate high-quality code in diverse programming languages. Google made significant strides in the field of artificial intelligence with the launch of Gemini.
Versions in Gemini AI
Google describes Gemini as a flexible model that is capable of running on everything from Google’s data centers to mobile devices. As part of the initial release, it is available in three different versions: Gemini Nano, Pro, and Ultra.
- Gemini Nano: Exclusively built for smartphones, it will be made available first in Google Pixel 8 Pro. Nano is specifically designed for smartphones to carry out on-device tasks like smart replies, summarizing recordings, emails, enhanced video photography, image editing, etc.
- Gemini Pro: Running on Google’s data centers, Gemini Pro is available across all Google products, starting with the chatbot Bard. It is designed to be used for scaling across a wide range of tasks. It’s capable of delivering fast response times and understanding complex queries.
- Gemini Ultra: The largest and most capable model for highly complex tasks. It is currently completing extensive trust and safety checks. Ultra is coming soon and targets rolling it out to developers and enterprise customers early next year. As per a Google blog post, Gemini Ultra’s performance exceeds current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development.
Gemini AI Key Capabilities
Below are key capabilities
- Sophisticated Reasoning: sophisticated multimodal reasoning capabilities can help make sense of complex written and visual information. It can extract insights from hundreds of thousands of documents through reading, filtering, and understanding information and will help deliver breakthroughs at digital speeds in many fields.
- Understanding text, images, audio, and more: Gemini 1.0 was trained to recognize and understand text, images, audio, and more at the same time, so it better understands nuanced information and can answer questions relating to complicated topics.
- Advanced coding: Gemini can understand, explain, and generate high-quality code for the most popular programming languages, like Python, Java, C++, and Go. Its ability to work across languages and reason about complex information make it one of the leading foundation models for coding in the world.
- More reliable, scalable, and efficient: Gemini runs significantly faster than earlier, smaller, and less-capable models.
Gemini AI Model Availability
Gemini 1.0 is now rolling out across a range of products and platforms:
Gemini Pro in Google Products: A fine-tuned version of the Gemini Pro is available as part of the Bard or for more advanced reasoning, planning, understanding, and more. It is made available in English in more than 170 countries and territories and is planned to expand to different modalities and support new languages and locations shortly.
Gemini Nano will be available in Pixel 8 Pro.
In the coming months, Gemini will be available in more of our products and services, like Search, Ads, Chrome, and Duet AI.
Building with Gemini: Starting on December 13, developers and enterprise customers can access Gemini Pro via the Gemini API in Google AI Studio or Google Cloud Vertex AI.
The below video demonstrates the capabilities of Gemini 1.0.
References: