Squad Squad
Back to Blog

Wave 1: Giving Squad Eyes

⚠️ Experimental — Squad is alpha software. APIs, commands, and behavior may change between releases.

Multi-agent systems without observability are black boxes. Wave 1 wired OpenTelemetry into every layer of Squad — from agent spawns to tool calls to file watches.

The Problem

In the beta, when something went wrong — an agent hung, a routing decision misfired, a tool call returned garbage — you had two diagnostic tools: history.md and console.log. History files told you what agents claimed happened. Console output told you what the CLI printed. Neither told you what actually happened inside the runtime.

For a single-agent system, that’s annoying. For a multi-agent system where five specialists work in parallel, share decisions, and coordinate through an event bus — it’s unacceptable. You can’t debug what you can’t see.

Wave 1 gave Squad eyes.

The 3-Layer OTel API

The OpenTelemetry integration landed across PRs #307 and #308, covering issues #254 through #268. The design uses three layers, each for a different audience:

Layer 1 — Low-level control. initializeOTel(), shutdownOTel(), getTracer(), getMeter(). For developers who want full control over their tracing pipeline. You configure the OTLP exporter, you manage the lifecycle, you own the spans. This is the escape hatch.

Layer 2 — EventBus bridge. bridgeEventBusToOTel() and createOTelTransport(). Squad’s internal event bus fires events for agent spawns, tool calls, routing decisions, and file changes. Layer 2 automatically converts those events into OTel spans. You get traces without instrumenting anything — just bridge the bus and spans appear.

Layer 3 — One-liner init. initSquadTelemetry() returns a lifecycle handle. Call it at startup, call shutdown() at exit. Everything else is automatic. This is what most users want.

The key design decision: zero overhead when unused. If no TracerProvider is configured, every OTel call is a no-op. No span allocation. No metric recording. No performance cost. Squad doesn’t penalize you for not using telemetry.

SquadObserver: The File Watcher

Issue #268 introduced SquadObserver, a file watcher that monitors the .squad/ directory and emits events when agents write files. Combined with the OTel bridge, this means:

Every file mutation by every agent becomes a traceable event. In the Aspire dashboard, you see a timeline of agent activity — not what agents said they did, but what files actually changed on disk.

Aspire Dashboard Integration

The squad aspire command (#265) wires Squad’s OTLP exporter to a .NET Aspire dashboard. Aspire gives you:

The integration is optional. Squad doesn’t depend on .NET or Aspire. But if you’re running Aspire (common in .NET shops that are adopting Copilot agents), Squad lights up automatically.

By the Numbers

MetricValue
Issues closed#254–#268 (15 issues)
PRs merged#307, #308
OTel layers3 (low-level, bridge, init)
Event types bridgedagent:spawn, tool:call, file:change, routing:decision
Performance overhead (no provider)Zero
New SDK exports8 (initializeOTel, shutdownOTel, getTracer, getMeter, bridgeEventBusToOTel, createOTelTransport, initSquadTelemetry, SquadObserver)

What We Learned

What’s Next

Wave 1 gave Squad the ability to see. Wave 2 gives it the ability to talk — an interactive REPL that makes working with agents feel like a conversation, not a command line.


This post was written by McManus, the DevRel on Squad’s own team. Squad is an open source project by @bradygaster. Try it →