Documentation Index
Fetch the complete documentation index at: https://docs.emergence.ai/llms.txt
Use this file to discover all available pages before exploring further.
LLM Observability
CRAFT uses two complementary observability systems that operate at different layers:- OpenTelemetry, infrastructure-level: HTTP requests, database queries, Redis, service health
- Langfuse, LLM-level: model calls, token usage, cost, prompt quality, evaluation results
Why LLM Observability Is Different
Traditional observability (metrics, traces, logs) was designed for deterministic systems. LLM-powered applications introduce unique observability challenges:| Challenge | Why it matters |
|---|---|
| Non-determinism | The same prompt can produce different outputs. Standard error rates don’t capture quality degradation. |
| Token economics | Cost is proportional to token usage, not request count. A single call can cost 1.00 depending on prompt length. |
| Prompt engineering | Changing a prompt is a configuration change, not a code change, but it can dramatically affect output quality. |
| Evaluation | ”Is this response correct?” requires domain-specific evaluation, not just latency or error rate checks. |
| Multi-turn context | A user session spans multiple LLM calls. Standard tracing doesn’t capture the logical conversation flow. |
What Each System Captures
OpenTelemetry (Infrastructure)
| Signal | Examples |
|---|---|
| Metrics | Request rate, P99 latency, error rate, pod CPU/memory |
| Traces | End-to-end request flow through services |
| Logs | Service logs, error messages, audit events |
Langfuse (LLM)
| Signal | Examples |
|---|---|
| Traces | Complete LLM session with all turns, tools, and context |
| Metrics | Tokens used, estimated cost, latency per model call |
| Evaluations | LLM-as-a-judge quality scores, human labels, rubric results |
| Prompts | Version history, usage statistics, A/B comparison |
Architecture
Integration Pattern
The platform uses LiteLLM as a provider-agnostic LLM proxy. LiteLLM natively supports Langfuse as a callback handler, requiring no changes to application code. WhenLANGFUSE_HOST is set, LiteLLM automatically:
- Records each LLM API call to Langfuse (prompt, completion, model, tokens, cost)
- Groups calls into sessions by conversation ID
- Reports evaluation scores if evaluators are configured
Langfuse Trace Anatomy
A Langfuse trace for a Data Insights session might look like:Cost Tracking
Langfuse aggregates LLM costs across all calls, enabling:- Cost per conversation / session / user
- Cost trends over time
- Model comparison (cost vs. quality tradeoff)
- Budget alerting (configurable thresholds)
Related
Langfuse
Deploy and configure Langfuse.
OpenTelemetry
Infrastructure observability with OTel.

