Reinforcement Learning from Human Feedback
Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences....
By DeepLearning.AI on Coursera
About This Course
Large language models (LLMs) are trained on human-generated text, but additional methods are needed to align an LLM with human values and preferences. Reinforcement Learning from Human Feedback (RLHF) is currently the main method for aligning LLMs with human values and preferences. RLHF is also used for further tuning a base LLM to align with values and preferences that are specific to your use case. In this course, you will gain a conceptual understanding of the RLHF training process, and then practice applying RLHF to tune an LLM. You will: 1. Explore the two datasets that are used in RLHF training: the “preference” and “prompt” datasets. 2. Use the open source Google Cloud Pipeline Components Library, to fine-tune the Llama 2 model with RLHF. 3. Assess the tuned LLM against the original base model by comparing loss curves and using the “Side-by-Side (SxS)” method.
Topics Covered
Frequently Asked Questions
How much does Reinforcement Learning from Human Feedback cost?
Reinforcement Learning from Human Feedback costs $49. Check the course page for current pricing and available discounts.
Who teaches Reinforcement Learning from Human Feedback?
Reinforcement Learning from Human Feedback is taught by DeepLearning.AI, DeepLearning.AI.
What skill level is Reinforcement Learning from Human Feedback for?
This course is designed for all levels learners.
Similar Courses
HTML & CSS Coding for Beginners: Build your own portfolio!
Chris Dixon
Maya for Beginners: Animation
Lucas Ridley
JavaScript for Beginners (includes 6+ real life projects)
Kalob Taulien
Beginner Bootstrap 4: Hand code beautiful responsive websites fast
Chris Dixon