Noteworthy Differences

AI alignment for detecting meaningful changes

The challenge: Documents are constantly updated, but users only want notifications for significant changes. Training AI systems to detect what humans consider noteworthy requires careful alignment.

The solution: A two-stage AI alignment pipeline that combines classifier disagreement detection with human-in-the-loop annotation to create aligned AI judges.

Technical achievements:

Two-stage architecture with classifiers and judge models for robust change detection
Disagreement-based annotation focusing human effort on hard examples (only 8-9% of cases)
16% improvement in test accuracy with heuristic-aligned judge vs unaligned baseline
Confidence estimation based on agreement levels among classifiers and judge
Production-ready Gradio interface for real-time noteworthy difference detection

Jeffrey Dick

AI alignment for detecting meaningful changes