I gave Google Gemini 1.5 a video of the total eclipse and asked it to write a song — here’s what it sounds like
Google Gemini Pro 1.5 is a game changing moment for multimodal artificial intelligence. It allows you to feed it a video, audio or image file and ask questions about the contents.
To see how well it performed, I gave Gemini Pro 1.5 a completely silent video of the moment of totality from the recent total solar eclipse visible across North America.
Working in the Google Cloud platform VertexAI, I was able to also give Gemini some instruction along with the video clip. I asked it to write the lyrics and a prompt for an AI music generator to create a song inspired by the contents of the video.
I then put the prompt and lyrics into Udio to create the song and fed the complete track back into Gemini Pro 1.5 and asked it to listen to the track and plot out a music video.
What is Google Gemini Pro 1.5?
Google released its Gemini family of models in November, starting with the tiny Nano which is available on some Android phones. It then dropped Pro which now powers the Gemini chatbot and finally in January released the powerful, GPT-4-level Gemini Ultra.
Last month the search giant released its first update to the Gemini family, unveiling Gemini Pro 1.5 which has a massive million token context window, a mixture of experts architecture to improve responsiveness and accuracy, as well as true multimodal capabilities.
While it is currently only available for developers through an API call or the VertexAI cloud platform, this advanced functionality is expected to appear in the Gemini chatbot soon.