X
Innovation
Why you can trust ZDNET : ZDNET independently tests and researches products to bring you our best recommendations and advice. When you buy through our links, we may earn a commission. Our process

'ZDNET Recommends': What exactly does it mean?

ZDNET's recommendations are based on many hours of testing, research, and comparison shopping. We gather data from the best available sources, including vendor and retailer listings as well as other relevant and independent reviews sites. And we pore over customer reviews to find out what matters to real people who already own and use the products and services we’re assessing.

When you click through from our site to a retailer and buy a product or service, we may earn affiliate commissions. This helps support our work, but does not affect what we cover or how, and it does not affect the price you pay. Neither ZDNET nor the author are compensated for these independent reviews. Indeed, we follow strict guidelines that ensure our editorial content is never influenced by advertisers.

ZDNET's editorial team writes on behalf of you, our reader. Our goal is to deliver the most accurate information and the most knowledgeable advice possible in order to help you make smarter buying decisions on tech gear and a wide array of products and services. Our editors thoroughly review and fact-check every article to ensure that our content meets the highest standards. If we have made an error or published misleading information, we will correct or clarify the article. If you see inaccuracies in our content, please report the mistake via this form.

Close

What is Gemini? Everything you should know about Google's new AI model

Google recently released its most powerful AI model yet, but what can it do?
Written by Maria Diaz, Staff Writer
Google Gemini website on laptop reads, welcome to the Gemini era
Maria Diaz/ZDNET

What is Google Gemini?

Gemini is a powerful artificial intelligence (AI) model from Google that can understand text, images, videos, and audio. As a multimodal model, Gemini is described as capable of completing complex tasks in math, physics, and other areas, and understanding and generating high-quality code in various programming languages. 

It is currently available through the Gemini chatbot (formerly Google Bard) and some Google Pixel devices and will gradually be folded into other Google services. During Google I/O 2024, the company announced new features that will come to Gemini, including a new 'Live' mode and integrations with Project Astra. Gemini also powers AI overview in Google searches.

Also: I ranked the AI features announced at Google I/O from most useful to gimmicky

"Gemini is the result of large-scale collaborative efforts by teams across Google, including our colleagues at Google Research," said Dennis Hassabis, CEO and co-founder of Google DeepMind, when announcing Gemini. 

"It was built from the ground up to be multimodal, which means it can generalize and seamlessly understand, operate across, and combine different types of information including text, code, audio, image, and video."

Who made Gemini?

Gemini was created by Google and Alphabet, Google's parent company, and released as the company's most advanced AI model to date. 

Also: The ChatGPT desktop app is more helpful than I expected - here's why and how to try it

Google DeepMind also made significant contributions to the development of Gemini. 

Are there different versions of Gemini?

Google describes Gemini as a flexible model capable of running on everything from Google's data centers to mobile devices. To achieve this level of scalability, Gemini was released in three sizes: Gemini Nano, Gemini Pro, and Gemini Ultra.

  • Gemini Nano 1.0: The Gemini Nano model size is designed to run on smartphones, initially launched on the Google Pixel 8. It's built to perform on-device tasks that require efficient AI processing without connecting to external servers, such as suggesting replies within chat applications, understanding images, or summarizing text. The Gemini Nano model features a 32,000-token context window.
  • Gemini Flash 1.5: This model is built for speed, so it's a lightweight and cost-efficient option. The model features a long context window, with a one-million token context by default, enough to process an hour of video or over 30,000 lines of code. 
  • Gemini Pro 1.5: Running on Google's data centers, Gemini Pro is designed to power the latest version of the company's paid AI chatbot service, Gemini Advanced. This model can deliver fast response times and understand complex queries. Google just upgraded its context window to two million tokens, the longest of any large-scale model available now. 
  • Gemini Ultra 1.0: Google describes Gemini Ultra as its most capable model, exceeding "current state-of-the-art results on 30 of the 32 widely-used academic benchmarks used in large language model (LLM) research and development." It's designed for highly complex tasks and is available through Vertex AI and Google AI Studio with the Gemini API.

Also: This subtle (but useful) AI feature was my favorite Google I/O 2024 announcement

How can you access Gemini?

The fastest way to use the Gemini model is to go to the AI chatbot's website, Gemini.Google.com. You can have a conversation with Gemini through this site like you can with ChatGPT and other AI chatbots.

The Gemini model is available in Google products, like Android-powered devices, the Gemini mobile app, Google searches with an AI overview, Google Photos, and more. Google plans to integrate Gemini further into its Search, Ads, Chrome, and other services. 

Also: Google Glass vs. Project Astra: Sergey Brin on AI wearables and his top use case

Developers and enterprise customers can access Gemini Ultra via the Gemini API in Google's AI Studio and Google Cloud Vertex AI. Android developers have access to Gemini Nano via AICore.

How does Gemini differ from other AI models, like GPT-4?

Google's new Gemini model appears to be the largest, most advanced AI model to date, though the widespread release of the Ultra model will determine that fact for certain. Compared to other popular models that power AI chatbots, Gemini stands out due to its native multimodal characteristic and long context window of one million tokens. 

Also: What does GPT stand for? Understanding GPT 3.5, GPT 4, GPT-4 Turbo, and more

GPT-4, by comparison, is available in 8k and 32k token contexts. 

Gemini Ultra and Pro vs GPT-4

A comparison chart from Google shows how Gemini Ultra and Pro compare to OpenAI's GPT-4 and Whisper, respectively. 

Google/ZDNET

Compared to GPT-4, a primarily text-based model, Gemini easily performs multimodal tasks natively. While GPT-4 excels in language-related tasks, such as content creation and complex text analysis natively, it resorted to OpenAI's plugins to perform image analysis and access the web at the time of testing and relies on DALL-E 3 and Whisper to generate images and process audio. 

This approach could change when OpenAI makes GPT-4o widely available, as ChatGPT won't rely on three separate models to perform actions and will instead use an omnimodel. 

Also: The best AI chatbots: ChatGPT and other noteworthy alternatives

Google's Gemini also appears to be more product-focused than other models available. Gemini is either integrated into the company's ecosystem or has plans to be, as it's powering both the chatbot and Android devices. Other models, like GPT-4 and Meta's Llama, are more service-oriented and available for various third-party developers for applications, tools, and services.

Editorial standards