Blog - David Hariri

Saturday November 16th, 2024

Only one LLM is good at chess

Exploring how different LLMs perform at chess, with most failing except turbo-instruct. Discusses tuning and training influences.

LLMs
evals
ML

Saturday November 02nd, 2024

Just found an incredible guide on building LLMs as a judge by Hamel Husain! Super insightful, especially since we’re using a similar system at Ada to evaluate transcript resolutions. Excited about how smartly it's avoiding blind spots in test coverage!

LLMs
evals

Thursday October 17th, 2024

Working Probabilistically

Exploring the importance of thinking probabilistically when working with LLMs, this post highlights insights on effective eval methodologies, the quirks of model behavior, and practical tips for building robust evaluation processes that go beyond traditional testing.

LLMs
evals

Only one LLM is good at chess

Creating an LLM-as-a-Judge

Working Probabilistically