What it is
Langfuse is an open-source observability and eval platform for LLM apps. Trace inference calls, attach scores, define datasets, run experiments, and manage prompts — all in one self-hostable service. Free if you self-host; cloud starts at $29/month with usage-based pricing.
Developer experience
SDKs in Python and TS, plus OpenTelemetry and OpenAI/LangChain integrations that "just work." Drilling into a multi-step agent run feels closer to a real APM than what most eval-first competitors offer.
import { Langfuse } from "langfuse";
const lf = new Langfuse();
const trace = lf.trace({ name: "triage" });
const gen = trace.generation({ name: "classify", model: "gpt-4o" });
gen.end({ output });Where it shines
- Self-hosting. Helm chart, docker-compose, and a SOC 2-compliant cloud offering — pick your flavor. This is the differentiator for teams that can't ship customer data to a third-party SaaS.
- Tracing. Best-in-class for debugging real agent traffic. Session grouping connects related requests cleanly.
- Pricing. OSS is OSS. Cloud is reasonable.
Where it falls short
- Evals UX. Functional but less opinionated than Braintrust. You'll spend more time wiring things together to get a CI-gated eval flow.
- Scale ops. Self-hosting at high trace volume needs a real ClickHouse story.
Bottom line
If self-hosting matters or you want OSS, Langfuse is the obvious choice and a credible alternative to closed-source incumbents. If you'd pay anything to skip the ops work and want the most polished eval flow out of the box, look at Braintrust first — but Langfuse is the one we'd start with for almost any team that takes data control seriously.