Sundar Pichai, CEO of Google, unveiled the tech giant's latest artificial intelligence (AI) model named Gemini. The Alphabet-owned company's new AI model comprises three versions: Pro, Ultra, and Nano, with the Pro version already accessible and the Ultra version scheduled for an early release next year.
"Developers are using our models and infrastructure to build new generative AI applications, and startups and enterprises around the world are growing with our AI tools," says Pichai.
“And we continue to invest in the very best tools, foundation models and infrastructure and bring them to our products and to others, guided by our AI Principles,” he adds.
Gemini's Accessibility
At present, Google has integrated Gemini Pro into its chatbot Bard, a direct competitor to rival OpenAI's ChatGPT. While text-based interactions with Gemini-powered Bard are possible, Google has promised upcoming support for other modalities. The update is accessible in 170 countries and territories but is currently confined to the English language only.
Multimodal Capabilities
Pichai states that Gemini is designed as a multimodal model, which means it is capable of generalising and seamlessly incorporating text, images, and other data types, potentially enhancing its conversational abilities.
“With the image benchmarks we tested, Gemini Ultra outperformed previous state-of-the-art models, without assistance from object character recognition (OCR) systems that extract text from images for further processing. These benchmarks highlight Gemini’s native multimodality and indicate early signs of Gemini's more complex reasoning abilities,” as per the post.
Utilisation of Tools and APIs
Gemini comprises a series of models, each varying in size and capabilities. It may employ memory functions, fact-checking against sources like Google Search, and enhanced reinforcement learning to improve accuracy while mitigating the risk of generating misleading content.
“Additionally, we’re continuing to address known challenges for models such as factuality, grounding, attribution and corroboration,” the post adds.
Gemini is anticipated to significantly influence the AI industry, representing Google's most powerful AI model to date. It powers various applications and devices, such as the Bard chatbot and Pixel 8 Pro.
“We’re also bringing Gemini to Pixel. Pixel 8 Pro is the first smartphone engineered to run Gemini Nano, which is powering new features like Summarize in the Recorder app and rolling out in Smart Reply in Gboard, starting with WhatsApp,” the post states.
According to the post, “With a score of 90.0%, Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.”
Gemini is crafted to possess inherent multimodal capabilities, initiating its pre-training with diverse modalities right from the outset. Subsequently, it was fine-tuned using additional multimodal data to enhance and optimise its overall effectiveness.
It enables Gemini to effortlessly grasp and analyse diverse inputs comprehensively, surpassing the performance of current multimodal models. Its capabilities are state of the art across almost every domain, as per the report.
The post states, “Using a specialised version of Gemini, we created a more advanced code generation system, AlphaCode 2, which excels at solving competitive programming problems that go beyond coding to involve complex math and theoretical computer science.”
“To limit harm, we built dedicated safety classifiers to identify, label and sort out content involving violence or negative stereotypes,” it adds.