Exploring the Broken Audit Trail for Artificial Intelligence
Your AI agent has finished the work. It shipped code, summarized research, drafted a competitive analysis. You have the prompt and an output, but you have limited visibility into all the actions taken in-between: the decisions made, files opened, tools invoked, or paths taken on the way from instruction to outcome.
Many users aren’t likely to care. They got what they needed. On an organizational level, though, this represents a black box that presents massive productivity and safety challenges to adopting and operating AI at scale.
Your digital workforce operates without a paper trail
The productivity case for AI is largely settled. Regardless of function, organizations are deploying artificial intelligence to maximize efficiency and productivity often with significant investment. The expectation is that this investment produces measurable output at scale.
Yet, a recent Writer report flags that nearly half of executives report their investment in AI as a disappointment.
As a user, I’m generally capable of knowing what I’ve accomplished with AI. I’m less capable of explaining what it did to produce those outcomes. The same can be said for my team of engineers. I know they’re using AI and what they’re building with it, but it’s challenging to enumerate the actions those agents took—what files they touched, what it modified, and in what sequence. The work happened, but the record of how it happened didn't.
This is not a logging failure in a traditional sense, it’s an architectural problem that directly impacts our ability to maximize AI and to do so safely. With AI, malicious intent is not required for something to go wrong, and there are any number of examples in recent weeks.
The truth is in the trajectory
To understand the problem, you need to understand what an AI agent actually does between the prompt and the output, commonly referred to as an agent trajectory.
A trajectory is the complete, ordered record of an agent’s run: the initial instruction, every reasoning step the agent takes, tool it calls, piece of information it retrieves, file it touches, and action it executes in the sequence it occurs. It’s the full chain of context from human intent to machine action.
Unlike traditional software, AI agents are non-deterministic. Giving the same agent the same prompt twice may result in completely different paths to produce a functionally similar output. The entire trajectory can vary based on context, retrieved memory, how an opened tool responds, or a prior state. In deterministic software, you can log the result and reconstruct the path, because the path is always the same. With agents, the path is variable, considerably hampering your reverse engineering of the output.
This is what makes traditional observability or monitoring structurally insufficient for AI work. Current technology may observe the edges of the conversation (the prompt going in, the response coming out) and still know almost nothing about what happened in between.
[Trajectory]
That interior is where risk lives, and not just from a security perspective. It’s where token consumption happens, where file calls occur, and usage can accumulate in inefficient users. And yet, we have very little visibility into this critical journey.
Existing tools are incapable of unpacking this black box. API-level logging gives you a record that a call was made including the model, the token count, the timestamp, and maybe some additional metadata. Network monitoring tells you that traffic moved between a device and a known AI endpoint. These are all real signals, but they are all incomplete for the same reason: they observe those edges of AI activity, not the trajectory.
Consider what Claude Code does when it is actually working. It reads files across the system, executes shell commands, and makes decisions about which files to modify and which to leave alone. It reasons through a problem across potentially dozens of intermediate steps before producing any visible output. None of this is visible or sequential with API logs and network traces. The audit trail of what it did while it was running simply does not exist in any tools we currently use.
Why we need to care
I’ve achieved my goal and my users are efficient with AI. We may believe this, (though it is unlikely) but it doesn’t change the nature of why we need to capture an agent’s trajectory.
In the same way you would want to know if your employee was reviewing sales contracts to write advertising copy, you should know how your digital labor is interfacing with your environment. Is it being efficient? Is it wasting tokens answering questions irrelevant to its query? Could better context have reduced the requirements of its trajectory? These are all questions that can be answered with full visibility into the agent’s trajectory.
Regulatory requirements
The EU AI Act reaches full enforcement in August, 2026. For organizations operating in the EU or serving EU markets, the requirements for high-risk AI systems are specific and unambiguous. These systems must be capable of automatic logging of events over their operational lifetime. Regulators have been explicit that this must include a step-by-step record of how decisions were made, from inputs to reasoning to outputs. This information needs to be structured and auditable.
This elevates the need to enumerate the agent’s trajectory. A continuous, attributable record of the chain of events that produced each outcome. An inability to do so comes with legitimate liability including fines up to €35 million or 7% of global turnover.
For US-based organizations, the NIST AI Risk Management Framework serves as the de facto governance standard for federal contractors and regulated industries, with auditability and traceability as foundational requirements. While not universally applicable, both standards emphasize that the regulatory direction for AI is being able to audit what it did.
Investment validation
Auditing AI is not simply a security practice. Organizations are spending materially on AI tooling, including our own. The ability to answer whether that spend produces intelligence, efficient work or merely increased output remains a challenge.
Much like our prior example, two employees leveraging Claude Code on the same project may produce wildly different outcomes. Where one took 8 steps, the other may take 80, touching dozens of files and making redundant calls along the way. The outputs look comparable, but the cost and efficiency may not be. Without trajectory-level visibility, you cannot always see that difference. That means you can’t optimize it, cannot set a baseline, and cannot comprehensively evaluate whether your AI investment is pointed toward the right users and projects.
As the conversation shifts from “Are people using AI?” to “Are people using AI well?” Full auditability of AI at scale is required to sufficiently understand the answer.
What observability actually requires
The trajectory cannot be reconstructed after the fact. It has to be observed where the work actually happens: at the endpoint, on the device, at the moment the agent is running.
This is not a philosophical position. It is an architectural one. An agent operating natively on an endpoint (reading files, executing commands, invoking tools, etc.) is doing work that exists only at the system level. No API sits between the agent and the file system. No network proxy intercepts the shell command. The only place the full trajectory can be captured is at the source.
Origin observes AI agent trajectories at the endpoint level. Every reasoning step, tool invocation, file access, network call, and action is logged in sequence, attributed to a specific user and device, and made available as a queryable record of what the agent actually did. Not just what it was asked to do. Not just what it ultimately produced. The complete chain, from intent to execution, is what compliance frameworks are going to require and how executives need to evaluate whether their AI workforce is actually performing.