Why we measure cost in spans, not tokens
Tokens describe what the model did; spans describe what your agent did. A short essay on which one we picked to put on the invoice, and why we don't regret it yet.
Long-form notes from the people building Cairn. Mostly about agents, evals, and the unglamorous parts of running a production stack.
Tokens describe what the model did; spans describe what your agent did. A short essay on which one we picked to put on the invoice, and why we don't regret it yet.
Most teams write five expensive evals and run them once a week. We made the opposite trade: hundreds of small evals, every single deploy.
What changed in our paging volume after we put eval suites in the deploy path. Mostly graphs, with a few asides about sleep.
The word does a lot of work in this industry. Here is the specific shape of the thing Cairn is built to observe, and the shapes it is intentionally bad at.
Faster trace search, a new replay UI, and Python SDK 1.0. The release notes are short on purpose.
A note on craft, restraint, and why our marketing page is a photograph of a valley and not an illustration of a brain.
Watch how a single run flows from SDK call to trace, eval, and dashboard.