What is AI Observability?

2026-04-21 · Spencer Thompson

As a founder, one of the largest expenses at my first company was our monthly Datadog bill.

If you’d ask me at the time if it was money well spent, I would have said yes without hesitation. Datadog didn’t do anything inherently magical. It didn’t automatically fix things or solve outages. We paid that bill for the ability to understand what was happening inside our application quickly enough to do something about it ourselves. As a consumer product, the website going down, even for a few minutes, meant real dollars wasted. Datadog was how we figured out why it went down and who needed to fix what.

That’s observability in its most basic form: the ability to understand the internal state of a system. You collect the data, you structure it, you make it queryable, and when you need to know something, you can actually answer the question.

It’s a concept most executives and technology leaders are familiar with. What has changed though, is that the version of observability we’ve come to understand was built for a fundamentally different world.

The evolution of non-deterministic software

Traditional observability software was built for when software was deterministic. If a user clicked 100 links, you could trace all 100 of them in order to see exactly what happened. It was complex, but it was doable as many organizations have turned observability platforms into multi-million dollar businesses.

The core logic of getting close, collecting everything, and making it queryable is still right, but what has changed is the software that we're running.

The computer use agents your teams are using today (Claude, Cursor, Codex, Gemini, and the dozens of others that are already present on our endpoints) are not software in the way that we've historically understood it. They are non-deterministic, multi-threaded, and capable of spawning agents. You are, in effect, summoning something onto your computer that operates by its own logic, makes its own decisions, and takes actions that you didn't individually approve. The whole value of an autonomous agent is that it can do work without you micromanaging every step.

But therein lies the challenge with AI observability.

If you have no idea what these agents are doing, then you have no frame of reference for what normal looks like. You cannot tell the difference between when an agent is working correctly and one that is broken. You cannot tell the difference between legitimate use and something that has gone badly wrong. You have introduced a category of risk that you are not equipped to characterize, let alone respond to.

The problem with scale

The realistic outcome of AI adoption is that individual employees run several agents on their devices for a variety of tasks. The number will continue to grow as tools get better and easier to use.

The implication for organizations is significant. Managing AI at scale isn't a software problem. It's a workforce management problem. You wouldn't run a company of any size without knowing who your employees are, what they're working on, or what you're paying them. The same question applies to digital labor.

What are those agents doing on whose behalf and towards what end? What is it actually costing you, not as a line item on a vendor, but in terms of where your intelligence budget is being directed and whether the work they’re producing is worth it? Most organizations have no meaningful answer to any of those questions.

That's why we need AI observability.

What AI observability gets you

Where traditional observability gave you a trace, the linear record you could follow step by step, AI agents require something different. The right object to observe is its trajectory - including the prompt that initiated the work, every decision the agent made in response (files accessed, tools called, commands run, tokens used), and all the way down to the outcome.

The prompt is the closest thing to human intent the system is going to record. Everything after it is execution. Capturing the full trajectory is what AI observability means in practice. It's the only way to make non-deterministic behavior understandable and accountable.

Accountability for AI agents

The surface-level impact of that trajectory is forensic. When something goes wrong, or when someone alleges that it did, your ability to respond depends entirely on whether you captured what actually happened. If a third-party auditor claims your organization exposed sensitive data, how do you prove the opposite?

The data that makes AI behavior interpretable lives at the endpoint. That's the only place where the full chain from what the user instructed the agent to to do, to what the agent did, is available for us to reconstruct.

This aligns with what compliance and global privacy frameworks are increasingly demanding. The EU AI Act already requires auditable records of AI behavior for high-risk systems. You cannot produce that record retroactively. You either observed it or you didn't.

Operational intelligence about your virtual workforce

The more immediate value for most organizations isn't forensic at all. It's operational. Observability means being able to answer at any time:

What your AI is actually doing across the business
Which tools are running on which devices and by whom
What kind of work they are being pointed toward
What your AI spend is actually doing and whether it corresponds to outcomes that are actually valuable for your business

These are not edge case questions. They are the questions that any leader with a material AI investment should be able to answer, including myself. Most cannot, because the data to answer them doesn't exist in any one place; it's scattered across vendor invoices or incomplete logs and whatever employees happen to self-report. AI observability consolidates that picture and makes it queryable, not as a periodic report but as a live, persistent record of what your AI workforce is doing on your behalf.

What you give up without AI observability

Skipping an observability layer doesn't mean accepting a specific or nameable risk. It means accepting sustained uncertainty, and for non-deterministic systems, uncertainty compounds. With traditional software, failure was bounded: a database went down, a service timed out, a build broke. You could write a run book; you could set an alert. You knew roughly what could go wrong.

With AI agents, the failure surface is far more open-ended. Can an agent make a decision that creates legal exposure? Access something it shouldn't have? Produce output that introduces liability? Many leaders that I talk to aren't managing a defined threat; they're managing the anxiety of not knowing the answer to those questions, and the anxiety drives exactly the wrong behavior: blanket bans that erode productivity and don't actually hold, or not employing governance at all, which produces the same exposure. Policy without observability is theater.

Observability is your foundation for safety

AI observability is the foundation on which every other decision you make about AI in your organization has to rest. You cannot surface what you cannot see; you cannot govern behavior you cannot observe; and you cannot measure the ROI on spend that you cannot attribute.

All of these problems (security, governance, cost, performance) require you first to answer basic questions: What AI is running in my environment and what is it doing? Most organizations today cannot confidently answer these questions. Organizations that figure out how to answer them are the ones who will be in a position to operationalize AI safely and at scale.