Posts about LLMs

Only one LLM is good at chess

Exploring how different LLMs perform at chess, with most failing except turbo-instruct. Discusses tuning and training influences.


Creating an LLM-as-a-Judge

Just found an incredible guide on building LLMs as a judge by Hamel Husain! Super insightful, especially since we’re using a similar system at Ada to evaluate transcript resolutions. Excited about how smartly it's avoiding blind spots in test coverage!


Anthropic Computer Use

Just tried out Anthropic's Computer Use demo in a Docker setup! It can control a virtual machine and run tasks like adding a knowledge base for our bots. Super impressive, but it did trip up on some commands and interactions. Excited to see where this tech goes!


Working Probabilistically

Exploring the importance of thinking probabilistically when working with LLMs, this post highlights insights on effective eval methodologies, the quirks of model behavior, and practical tips for building robust evaluation processes that go beyond traditional testing.


LLM Generated Descriptions

In this blog post, I share a recent enhancement to my website's intake endpoint that utilizes LLM technology to automatically generate short descriptions for my blog posts. By integrating OpenAI's API, I can now effortlessly create engaging summaries whenever I upload new content. I discuss the process behind this implementation, its effectiveness with past examples, and my plans to add features for generating tags based on existing ones. Dive in to learn how AI is transforming the way I present my ideas online!


Using LLMs in Production

A nod to Will Larson's post on using LLMs in production and some additional notes based on my own experience.


Did Code Win?

Some brief thoughts on the future of no-code in a world where code generation is ubiquitous.


LLMs as a New Kind of Computer

Thoughts on Beren's post about LLMs as a new form of computer