Build Multimodal Generative AI Applications
Ready to level up your GenAI skills? Step into the exciting world of multimodal AI, where language, images, and speech come together to build smarter, more...
By Hailey Quach on Coursera
About This Course
Ready to level up your GenAI skills? Step into the exciting world of multimodal AI, where language, images, and speech come together to build smarter, more interactive applications. In this hands-on course, you’ll learn how to build systems that work across multiple modalities, from creating AI-powered storytellers and meeting assistants to developing image captioning tools and video generation apps. You’ll gain experience with real-world tools like IBM’s Granite, OpenAI’s Whisper, Sora and DALL·E, Meta’s Llama, Mistral’s Mixtral, and Gradio. Plus, you'll explore multimodal search, question answering, and retrieval systems that combine text, speech, and visual data. By the end of the course, you’ll be able to design and build full-stack multimodal AI solutions using Python and frameworks like Flask and Gradio. If you’re looking to gain in-demand skills for building the next generation of AI applications, enroll today and power up your AI career!
Topics Covered
Frequently Asked Questions
How much does Build Multimodal Generative AI Applications cost?
Visit the Build Multimodal Generative AI Applications course page for current pricing and available discounts.
Who teaches Build Multimodal Generative AI Applications?
Build Multimodal Generative AI Applications is taught by Hailey Quach, IBM.
What skill level is Build Multimodal Generative AI Applications for?
This course is designed for all levels learners.
Similar Courses
HTML & CSS Coding for Beginners: Build your own portfolio!
Chris Dixon
Maya for Beginners: Animation
Lucas Ridley
JavaScript for Beginners (includes 6+ real life projects)
Kalob Taulien
Beginner Bootstrap 4: Hand code beautiful responsive websites fast
Chris Dixon