The reliability layer for AI agents

Your LLM agents fail silently in production. self-heal catches the failure, proposes a fix with memory of prior attempts, verifies it against your tests, and retries until it passes. One decorator.

View on GitHubpip install self-heal-llm

from self_heal import repair

def test_dollar_comma(fn):
    assert fn("$1,299") == 1299.0

def test_rupee(fn):
    assert fn("Rs 500") == 500.0

@repair(tests=[test_dollar_comma, test_rupee])
def extract_price(text: str) -> float:
    # Naive: only handles "$X.YY"
    return float(text.replace("$", ""))

# Triggers the repair loop until ALL tests pass.
extract_price("Rs 500")

Works with Claude, OpenAI, Gemini, and 100+ providers via LiteLLM. Sync and async. One decorator.

Default suite (19 hand-written bugs)

Naive16 / 19 (84%)

self-heal18 / 19 (95%)

QuixBugs (31 classic bugs)

Naive27 / 31 (87%)

self-heal29 / 31 (94%)

Providers out of the box

ClaudeOpenAIGeminiLiteLLM (100+)OllamavLLMOpenRouterGroq

Or bring your own: implement one propose() method.

Gemini 2.5 Flash, 3 max attempts, v0.4 harness. Full numbers in benchmarks/RESULTS.md.

What the library ships today

Three-layer safety

AST rails block dangerous calls. Subprocess sandbox runs proposals in python -I with pickle IPC. Opt-in, combinable.

pytest --heal-apply

Your failing tests trigger repair. libcst writes the accepted fix back to the file with a git-dirty guard and backup.

Streaming + async

Native apropose() on all four adapters. propose_chunk events stream tokens for agent UIs and observability.

The hosted control plane is coming

Audit log, policy engine, approval workflows, per-function health dashboards. Built on top of the OSS library. Bring your own API keys.

The OSS library is free forever. The waitlist is for the hosted control plane (audit log, policy engine, observability).