Documentation Index
Fetch the complete documentation index at: https://docs.emergence.ai/llms.txt
Use this file to discover all available pages before exploring further.
Debugging Agents
Agent failures are rarely obvious from error messages alone. This page covers how to inspect traces, analyse tool calls, iterate on prompts, and diagnose the most common failure modes.Toolbox
Before debugging, confirm you have access to these tools:| Tool | Purpose |
|---|---|
| Langfuse | Trace inspection, span hierarchy, token usage, LLM input/output |
ADK Web (adk web) | Interactive ADK agent debug UI — replay conversations, inspect tool calls |
| curl / jq | Direct A2A JSON-RPC invocation for isolated testing |
Agent Card (/.well-known/agent-card.json) | Verify capabilities and skills are declared correctly |
| Application logs | Structured JSON logs with task_id, context_id, user_id for correlation |
Reading a Trace in Langfuse
Every agent request produces a trace in Langfuse. The span hierarchy for a Pydantic AI agent looks like:agent.response_length— if 0, the agent produced no output (likely an error)mcp.tool.status—successor error classificationmcp.tool.result_size_bytes— large values (>50KB) indicate context bloat riskgen_ai.usage.input_tokens/gen_ai.usage.output_tokens— token budget
Symptom Index
Agent not responding — SSE stream closes immediately with no events
Agent not responding — SSE stream closes immediately with no events
Most likely causes:
-
Task validation failure — the request is missing
task_id,context_id, or authenticated user context. Check application logs forValueError: Task ID and Context ID must be provided. -
Authentication error — the JWT token is missing or invalid. If your executor validates the
Authorizationheader before processing the task, a missing or invalid token raises before emitting any events. -
MCP connection failure — the
StreamableHttpTransportcannot connect to the MCP server. Check thatMCP_SERVER_URLis set and the MCP pod is healthy.
Tool calls failing — agent loops or returns 'I couldn't complete that'
Tool calls failing — agent loops or returns 'I couldn't complete that'
Most likely causes:
-
MCP server returning errors — the tool call reaches the MCP server but returns a structured error. Check
mcp.tool.statusin the trace span. -
Iteration limit hit — the agent has exceeded
max_code_failures. Look for log line:Iteration limit reached for task {task_id}. -
Forbidden operation — the tool call uses a restricted pattern (e.g., filesystem access, blocked import). Look for
LINT_ERRORin the tool result. -
Tool schema mismatch — the LLM is passing incorrect argument types. Check the
mcp.tool.param_fingerprintacross calls — if it’s consistent and always failing, the tool schema is wrong.
Context overflow — LLM refuses to respond or responses degrade
Context overflow — LLM refuses to respond or responses degrade
Symptoms: Agent responses become shorter, less accurate, or the LLM refuses to call tools. Token usage approaches the model’s context limit.Most likely causes:Fix: Implement side payload interception in your toolset. Strip large data blobs (DataFrames, images, Plotly figures) before they reach the LLM context window; store them in the Assets API and pass only the resource URI.
- Tool results too large — a tool is returning large payloads (DataFrames, Plotly figures) directly to the LLM. Your toolset should strip large fields and store them in the Assets API, passing only the resource URI.
-
Conversation history too long — the task store is loading the full conversation history. Check
agent.context_metrics.history_messagesin the trace. -
System prompt too large — the instruction builder is including too much context. Check
agent.context_metrics.instruction_length.
Cost runaway — token usage or tool call count far exceeds expectations
Cost runaway — token usage or tool call count far exceeds expectations
Symptoms: A single agent request consumes 10x the expected tokens. The LLM is looping on tool calls or generating excessively long responses.Most likely causes:Fix:
-
Missing iteration limit — no
max_code_failuresguard on code execution tool. The LLM keeps trying different code variations. -
Tool always returning errors — the LLM keeps retrying a broken tool. Check
mcp.tool.statusacross the trace — allerrorwith the same tool name is a signal. -
Infinite delegation loop — two agents are delegating to each other. Check the orchestrator’s
sub_agentslist for circular references. -
Large system prompt being rebuilt per turn — the instruction builder is fetching context on every LLM round-trip. Check
agent.context_metrics.instruction_build_duration_s.
Datasource not found — text2sql or data agents return state=failed
Datasource not found — text2sql or data agents return state=failed
Symptoms:
state=failed with message “No datasource DataPart found” or “resource_uri resolution failed”.Most likely causes:-
Missing DataPart in the message — the caller did not include a
DataPartwithtype: "datasource". The A2A message must include both aTextPartand aDataPart. -
Wrong resource_uri format — the
resource_urimust be in the full four-segment format:data:{org_id}:{project_id}:{name}. The simplified format (data:my-db) is not accepted. -
Missing selected_schemas —
selected_schemasis empty or absent. Text2SQL requires exactly one schema entry.
Prompt Iteration
The fastest way to improve agent quality is iterating on the system prompt. Use ADK Web or direct A2A calls to test prompt changes without redeploying.ADK Web — Interactive Replay (Google ADK)
Claude Agent SDK — Prompt Replay
For Claude-based agents, replay prompt variations using the Anthropic SDK directly without running the full A2A server:anthropic.messages.create spans in Langfuse. Tool use blocks appear as child spans with input and output fields.
LangGraph — Debug with debug=True
LangGraph’s astream supports verbose debug output and LangSmith/Langfuse tracing:
langfuse_handler callback (see Eval Harness) — spans appear as langgraph:node:<name> entries in Langfuse.
Minimal Repro with Direct curl
For non-ADK agents, replay failing conversations directly:Prompt Change Checklist
Before changing the system prompt:- Identify the specific behaviour to change (use a Langfuse trace as evidence)
- Write a test case that captures the failure
- Make the minimum prompt change needed to fix the test case
- Run the full regression suite to check for new regressions
- Re-check token usage — prompt changes can inflate or deflate input token cost
Structured Logging for Correlation
All CRAFT agents log structured JSON withtask_id and context_id. Use these to correlate logs with Langfuse traces:
Next Steps
Eval Harness
Build regression suites to catch issues before they reach production.
Langfuse Setup
Configure Langfuse tracing for your agent.

