Inverse Constitutional AI

Inverse Constitutional AI (ICAI) inverts the Constitutional AI process: instead of using principles to generate feedback, you extract principles from existing feedback data.

Published at ICLR 2025 by Arduin Findeis, Timo Kaufmann, and collaborators.

The Core Insight

Pairwise preference data (human or AI) contains implicit principles. ICAI treats principle extraction as a compression problem: find the minimal set of natural language rules that reconstruct the original annotations.

The Algorithm

Generate candidates: LLM proposes principles that might explain the preferences
Cluster: Embedding model groups similar principles
Deduplicate: Sample one principle per cluster
Test: Evaluate each principle’s ability to reconstruct original annotations
Filter: Return principles that pass testing as the final constitution

Use Cases

Bias detection: Surface undesirable annotator preferences hiding in training data
Model understanding: Explain why a model behaves as it does
Feedback scaling: Apply extracted principles to new, unseen data
Personalization: Adapt AI to specific user or group preferences

vs Forward Constitutional AI

Direction	Input	Output	Purpose
Forward (CAI)	Principles	Preference feedback	Train aligned models
Inverse (ICAI)	Preference feedback	Principles	Interpret or audit models

Limitations

The C3AI Framework research found that human-aligned principles don’t always equal model-aligned principles. What humans prefer isn’t necessarily what models can reliably follow. ICAI extracts human preferences, but applying those principles may not produce the expected behavior.

Constitutional AI (the forward process)
Claude Constitution (Anthropic’s current implementation)
C3AI Framework (evaluating which principles actually work)
Collective Constitutional AI (sourcing principles from populations)

🌻 prg.sh

Explorer

Inverse Constitutional AI

The Core Insight

The Algorithm

Use Cases

vs Forward Constitutional AI

Limitations

Sources

Graph View

Table of Contents

Backlinks

🌻 prg.sh

Explorer

Inverse Constitutional AI

The Core Insight

The Algorithm

Use Cases

vs Forward Constitutional AI

Limitations

Related

Sources

Graph View

Table of Contents

Backlinks