AI alignment for detecting meaningful changes

Noteworthy Differences workflow

GitHub commit activity Open in Spaces

The challenge: Documents are constantly updated, but users only want notifications for significant changes. Training AI systems to detect what humans consider noteworthy requires careful alignment.

The solution: A two-stage AI alignment pipeline that combines classifier disagreement detection with human-in-the-loop annotation to create aligned AI judges.

Technical achievements:

  • Two-stage architecture with classifiers and judge models for robust change detection
  • Disagreement-based annotation focusing human effort on hard examples (only 8-9% of cases)
  • 16% improvement in test accuracy with heuristic-aligned judge vs unaligned baseline
  • Confidence estimation based on agreement levels among classifiers and judge
  • Production-ready Gradio interface for real-time noteworthy difference detection