Reliability & verification
Agents that quit halfway, narrate instead of doing, or claim work that didn't happen are the default failure mode of autonomous systems. Exolvra answers that with a three-layer reliability system, a set of guardrails that validate every turn, and receipts that prove work actually landed.
What this page is for
This page explains the machinery that turns “an agent that sometimes works” into “work that gets done and can be verified.” It’s conceptual — you won’t configure most of it, but understanding it explains why issues move the way they do.
Three layers of reliability
- Execution engine — agent work runs as a multi-turn loop. After each turn, guardrails check the output; if one fails, the engine feeds specific corrective feedback and the agent tries again, up to a turn cap. The loop tracks phases (Planning → Executing → Saving → Verifying → Reporting) and auto-continues when an agent exhausts its tool budget mid-task.
- Heartbeat watchdog — a background service runs every 30 seconds. It picks up newly assigned open issues, auto-dispatches unassigned ones to the best-fit specialist, and sweeps for issues stuck “in progress” with no activity for a few minutes, nudging the agent to continue.
- Human oversight — the dashboard surfaces stale issues, rate-limit/retry/fallback events on the Activity Timeline, and verification receipts. You can intervene, reassign, or override at any time.
A single authority owns the “is this issue running?” signal, so two entry points can never double-run the same issue or bypass the per-agent concurrency cap.
Guardrails
Guardrails are checks that run after each turn of an issue execution. They catch the ways agents fail short of done — empty acknowledgments, deferral (“I’ll continue next cycle”), generic filler, missing tool calls, un-persisted research, and more. Some are scope-gated to specific agents (for example, the design and app-builder visual checks). When a guardrail fails, the agent gets a targeted correction prompt and retries — it can’t simply declare success.
The final guardrail is an LLM-scored completeness check that catches nuanced failures the rule-based ones miss.
Async jobs: park and resume
Some work is genuinely long-running — a build, a long script, an external job. Instead of burning compute polling, an agent can fire a job and park: the issue moves to a waiting status with zero compute, and the platform auto-resumes the agent with the result when the job finishes. The same pattern powers GitHub PR landing, where an issue parks on CI and resumes on green.
Verification receipts
Reaching “PendingReview” isn’t the same as being correct. A goal-alignment verifier scores a finished deliverable against its goal and writes a verification receipt — a chain-hashed record with a verdict (🟢 pass, 🟡 warn, 🔴 fail). A 🔴 reopens the issue and the same agent reworks it, rather than dumping it on a human. Receipts (including the PR-landing receipt from the GitHub loop) are the durable proof that work actually happened and met its bar.
Related
- Projects, goals, issues — the lifecycle these layers operate on.
- GitHub — park-on-CI and the PR-landing receipt in action.
- Observability and Audit log — where the evidence lands.