Skip to main content

Command Palette

Search for a command to run...

The retry primitive was correct. The callers weren’t.

Updated
2 min read
The retry primitive was correct. The callers weren’t.
B
I build reliability infrastructure for AI agent systems and write about what I find. Most agent pipelines fail the same way: retry logic that treats every error identically, no budget on retries, no floor on Retry-After, no explicit intent before irreversible actions fire. I've spent two years building tooling for that failure surface — Pitstop Scan, Guard, and intent-gate. If your agents are silently degrading in production, this is where I think out loud about why.

Last week I wrote about how 429s turn into retry storms.
This week I started scanning real repos.
Same pattern.
Over and over.

So I built a small CLI to flag it:

npx pitstop-check ./src

It scans TypeScript/JavaScript and flags:

  • 429 handled without Retry-After

  • blanket retry of all 429s (no CAP vs WAIT distinction)

  • unbounded retry loops


Example: OpenClaw
https://github.com/openclaw/openclaw/issues/50866

[WARN] src/agents/venice-models.ts:24 --- 429 handled without Retry-After
[WARN] src/agents/venice-models.ts:24 --- All 429s treated as retryable (CAP vs WAIT not distinguished)

pitstop-check found 2 issues

What's happening:

  • API returns Retry-After: 600

  • client ignores it

  • retries on its own schedule

  • multiple agents retry independently

  • request rate increases under throttling → sustained failure


What surprised me

The retry primitive is often correct.
The call sites aren't.

In OpenClaw:

  • retry.ts supports retryAfterMs

  • it's implemented correctly

  • it's tested

But:

  • venice-models.ts

  • compaction.ts

  • memory-tool.ts

all call retryAsync without passing it.
The correct behavior exists in the codebase—It's just not wired up.


The pattern

Most systems collapse three distinct cases:

  • WAIT → respect Retry-After

  • CAP → bound retries / reduce load

  • STOP → fail fast

into:

retry()

That's how you get retry storms.


Repo: https://github.com/SirBrenton/pitstop-check
If you run this on your code and it flags something interesting, I'd genuinely like to see it.

2 views