Everything You Need to Know About Google's Gemini Model and Other AI Developments
In the world of AI, news can sometimes be slow, and sometimes it arrives all at once. In the last few days, we have had dramatic new leaked insights into the sheer breadth of Google's Gemini. Just today, we've had the release of Meta's Code Llama and earlier, their impressive multilingual seamless M4T model. And last but definitely not least, this 88-page AI Consciousness report. And yes, I read it all. It's juicy, so I'm saving that for the end. But let's start with two major paywall articles, one from The Information and one from The New York Times, about Google's Gemini model. From both of them, I counted a total of nine new revelations, so let's get straight to it.
🚀 Gemini: The Everything Model
To give you a sense of timeline, by the way, Google's newly merged AI SWAT team, which they call it, is preparing for a big fall or autumn launch. The takeaway for me from both articles is that Gemini is going to be the everything model. Did you know it's going to be the rival to Mid-journey and Stable Diffusion? Mid-journey only has 11 full-time staff, so it is more than plausible that Google's Gemini could outperform Mid-journey version 5. Next, we may be able to create graphics with just text descriptions and control software using only text or voice commands. These next two are speculations, so I'm not even counting them in the list of leaks. I've already covered in a previous video that Gemini has been trained on YouTube video transcripts, and the speculation is that by integrating video and audio into Gemini, it could perhaps help a mechanic diagnose a problem with a car repair based on a video or be a rival to Runway ML by generating advanced text and video based on descriptions of what a user wants to see. You can start to see why I'm beginning to think of it as the everything model.
🤖 Meta's Code Llama
If the fall seems far away, how about today when we got Code Llama from Meta? I spent much of the last 2 hours reading most of the 47-page paper, and you can see Code Llama in action on screen. Some highlights include that the Code Llama models provide stable generations with up to 100,000 tokens of context. Obviously, that could be used for generating longer programs or providing the model with more context from your codebase to make the generations more relevant. It comes in three versions: Code Llama, Code Llama Instruct, which can better understand natural language instructions, and Code Llama Python, better, of course, at Python. It's available for commercial use, and as you can see, some of the versions rival GPT 3.5 on human eval. That top score of 53.7% on pass at one puts it in the same ballpark as 51. I've actually done a full video on 51, so do check that out, but that got 50.6%. But it is about 25 times smaller at 1.3 billion parameters. Interestingly, the Code Llama paper, which also came out about 2 hours ago, mentions F1 directly, saying that it follows in a similar spirit, but the difference is that F1 is closed source.
🌐 Seamless M4T
Seamless M4T was released a couple of days ago from Meta, which frankly seems amazing for multilingual translation. That's speech to text, speech to speech, text to text, and more. It has speech recognition for nearly 100 languages and can output in 36 languages. But there's one feature I find particularly cool. Now let's talk about code switching. Code switching happens when a multilingual speaker switches between languages while they are speaking. Our model, Seamless M4, automatically recognizes and translates more than one language when mixed in the same sentence as a multilingual speaker. This is a very exciting capability for me. I often switch from Hindi to Delu when I speak with my dad. Notice in the following example when I change languages.
🧠 AI Consciousness Report
And finally, we come to the AI Consciousness report. The report counts as one of its co-authors Yoshua Benjo, the touring award winner. It was dense and quite technical but well worth the read. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems that satisfy these indicators. These are the IND indicators, and each one gets a few pages in the report. The reason that they're split up is that each one rests on a certain theory of consciousness. Obviously, the key problem is that we don't have a consensus theory on what consciousness is or how it comes about. So in a way, to hedge their bets, they group in different theories and look at the kind of indicators that would satisfy each one. You might say that list seems so theoretical. Why not just test the model or even ask the model? For more on that approach, see my theory of mind video. But the problem is, as they say on page four, the main alternative to a theory-heavy approach is to use behavioral tests for consciousness. But as I talked about in the other video, that method is unreliable because AI systems can be trained, of course, they are, to mimic human behaviors while working actually in very different ways. Essentially, LLMs have broken the traditional tests for consciousness, including, of course, the Turing test. The paper also rests on the assumption of computational functionalism, essentially that computations are essential for consciousness, as in, it's not what you're made of, it's what you do. If this is wrong, and the substrate, in fact, is key, say biological cells, then it stands to reason that AI would never be conscious. But one of their early conclusions is that if computational functionalism is true, and it is widely believed, conscious AI systems could realistically be built in the near term. Having digested the entire paper, they're strongly suggesting that we're not there yet, but if this theory is true, we could be there, especially if researchers deliberately designed systems to meet these criteria. In fact, here is a key quote from one of the authors in science that came along with the piece: "It would be trivial to design all of these features into an AI. The reason no one has done so is it is not clear that they would be useful for tasks." Now, to be honest, it is way beyond my pay grade to try to explain every aspect of the paper, but I'm going to try my best to convey the key bits.
📝 Highlights
- Google's Gemini model is going to be the everything model.
- Code Llama models provide stable generations with up to 100,000 tokens of context.
- Seamless M4T can recognize and translate more than one language when mixed in the same sentence.
- The AI Consciousness report suggests that conscious AI systems could realistically be built in the near term.
❓ FAQ
Q: What is Google's Gemini model?
A: Google's Gemini model is an AI model that is going to be the everything model.
Q: What is Code Llama?
A: Code Llama is a model that provides stable generations with up to 100,000 tokens of context.
Q: What is Seamless M4T?
A: Seamless M4T is an AI model that can recognize and translate more than one language when mixed in the same sentence.
Q: What does the AI Consciousness report suggest?
A: The AI Consciousness report suggests that conscious AI systems could realistically be built in the near term.