Meet “Muse”, a text-to-image generating model from GoogleAI
Amongst the current AI-trend sweeping the internet, Google has released a new text-to-image generating tool called Muse. The transformer-based AI image creator is able to create high-quality images at record speed. The new tool from Google AI is faster and more efficient than many competitors. But, what exactly does that mean?
A group from Google Research introduced Muse as a tool that’s on par with most current models. However, the researchers went on to say that Muse is significantly more efficient than existing diffusion models like Stable Diffusion, Dalle-E 2 and even Google Parti. However, what is this efficiency scale based on?
During research and extensive testing by GoogleAI, the researchers found that Muse delivers similar quality images, much faster. When evaluated, Muse was compared to Parti-3B and Imagen. Against these competitors, Muse was able to produce images matching them in quality, variety and text alignment. However, Muse stood out as significantly faster than these competitors.
Muse has a 1.3 second generation time to create images compared to Stable Diffusion which has a 3.7 second generation time. This is a significant difference, making Muse much faster.
The research team was able to achieve this speed in Muse using a compressed discrete latent space and parallel decoding. Regarding text comprehension, Muse uses a frozen T5 language model. This means it fully processes a text prompt instead of focussing on certain words or phrases. This makes it easier to get ahead with using AI-driven tools.
Muse also sports a new architecture that changes how images are adjusted or edited. This new range of image editing applications makes it easier to edit images using text prompts. This makes it possible to make changes in your generated images without needing to use complex masks, but instead you just use prompts.
During independent testing by other human applicants, Muse images were rated as better suited to text input than Stable Diffusion 1.4. This was the general consensus in 70.6% of testers. Testers also found that Muse is above average when it comes to incorporating predefined words into images. Muse has also shown to be more accurate in its composition than many competitors. This means it’s able to display image elements from the prompt more exactly, ie. three wine bottles or five yellow boxes.
The Muse team has pointed out that depending on the unique use case there is the ‘potential for harm’. This is not strange when it comes to scientific work on AI systems, especially when it relates to language and images. The tool could be used to reproduce social biases or spread misinformation if used maliciously. As such, the team has decided not to publish the code for Muse. They’ve also held off on releasing a publicly available demo as Muse is a closed model at the moment. If you’re looking for a good image-to-text AI generator in the meantime, why not have a look at ChatGPT? This AI-driven tool can help with more than just images.
Advertisement
I still have no clear understanding of how to use Muse. It isn’t publicly available as an app. yet. Does it have a gui?
Lots of random links provided, perhaps a link to something more relevant: http://muse-model.github.io/ would have made a bit more sense.
Far out! Everything is all cosmic again!