video55 MIN PREMIUM

Constitutional AI: Harmlessness from AI Feedback

Bai, Kadavath, Kundu, Askell, Kernion, Jones, et al. · 2022 · DOI 10.48550/arXiv.2212.08073

SUMMARY

Replace human harmlessness labels with AI-generated critiques + revisions guided by a written constitution; reinforces with RLAIF.

Premium subscribers get the full video, transcript, and code repository.