
Constitutional AI (CAI) is Anthropic's technique for training AI models to be helpful, harmless, and honest — using a set of explicit principles rather than purely human feedback.
CAI makes the values baked into an AI model explicit and auditable — rather than implicit and opaque. It's a significant advance in making AI safety transparent.
Reference:
TaskLoco™ — The Sticky Note GOAT