Just found an incredible guide on building LLMs as a judge by Hamel Husain! Super insightful, especially since we’re using a similar system at Ada to evaluate transcript resolutions. Excited about how smartly it's avoiding blind spots in test coverage!
Exploring the importance of thinking probabilistically when working with LLMs, this post highlights insights on effective eval methodologies, the quirks of model behavior, and practical tips for building robust evaluation processes that go beyond traditional testing.