Flash 1.5, Gemma 2 and Venture Astra

Contents

Considerably bettering 1.5 Professional Gemini Nano understands multimodal inputs

1.5 Flash excels at summarization, chat purposes, picture and video captioning, information extraction from lengthy paperwork and tables, and extra. It is because it’s been skilled by 1.5 Professional by a course of referred to as “distillation,” the place essentially the most important data and expertise from a bigger mannequin are transferred to a smaller, extra environment friendly mannequin.

Learn extra about 1.5 Flash on the Gemini expertise web page, and find out about 1.5 Flash’s availability and pricing. We’ll share extra particulars in an up to date Gemini 1.5 technical report quickly.

Considerably bettering 1.5 Professional

Over the previous couple of months, we’ve considerably improved 1.5 Professional, our greatest mannequin for basic efficiency throughout a variety of duties.

Past extending its context window to 2 million tokens, we’ve enhanced its code technology, logical reasoning and planning, multi-turn dialog, and audio and picture understanding by information and algorithmic advances. We see sturdy enhancements on public and inner benchmarks for every of those duties.

1.5 Professional can now observe more and more advanced and nuanced directions, together with ones that specify product-level conduct involving function, format and elegance. We’ve improved management over the mannequin’s responses for particular use instances, like crafting the persona and response model of a chat agent or automating workflows by a number of operate calls. And we’ve enabled customers to steer mannequin conduct by setting system directions.

We added audio understanding within the Gemini API and Google AI Studio, so 1.5 Professional can now motive throughout picture and audio for movies uploaded in Google AI Studio. And we’re now integrating 1.5 Professional into Google merchandise, together with Gemini Superior and in Workspace apps.

Learn extra about 1.5 Professional on the Gemini expertise web page. Extra particulars are coming quickly in our up to date Gemini 1.5 technical report.

Gemini Nano understands multimodal inputs

Gemini Nano is increasing past text-only inputs to incorporate pictures as properly. Beginning with Pixel, purposes utilizing Gemini Nano with Multimodality will be capable of perceive the world the way in which folks do — not simply by textual content, but additionally by sight, sound and spoken language.

Learn extra about Gemini 1.0 Nano on Android.

Considerably bettering 1.5 Professional

Gemini Nano understands multimodal inputs

You Might Also Like

OpenAI launches new AI mannequin with superior reasoning capabilities

Empowering YouTube creators with generative AI

Our newest advances in robotic dexterity

A breakthrough in high-resolution picture reconstruction with neural networks

AlphaProteo generates novel proteins for biology and well being analysis