What's in the box

Every feature, explained.

Orbis QA is opinionated about which features matter. Here's the full list — what each does, and how it shows up in your day.

01

Core

What the agent does every run.

Plain-English scenarios

Describe what to test in two sentences. The planner LLM turns it into clicks, types, and assertions at run time.

Recovery sub-loop

When a step fails, a dedicated diagnosis LLM call decides whether to retry, skip the failing assertion, or abort — never a dumb 'retry 3 times'.

Live progress stream

Every step streams as it happens. Watch the agent reason, act, and assert with full screenshots in the Live Run view.

Evidence trail

Screenshot + ARIA snapshot + network slice captured at every step, stored on disk for as long as you want. Reports are self-contained HTML.

Suites

Group scenarios into named batches. Run them sequentially with one click; aggregate pass/fail at the end. Set continue-on-failure per suite.

Multi-role sessions

Switch between admin and end-user mid-scenario. The session manager captures cookies + storage and restores per role — without re-login every time.

02

Analytics & observability

Know what's happening and why.

Run-vs-run compare

Pick any two runs from history. Side-by-side step diff with drift highlighting — see exactly what changed.

Flake detection

Per-scenario pass-rate over the last 20 runs, with a flake badge when consistency drops below 100%.

Cost ledger

Today, this week, this month — broken down by model. Know exactly what your QA agent is spending on your LLM key.

Activity log

In-memory ring of recent engine events. Always on, never persisted, never sent off-device. Privacy-safe even with telemetry disabled.

03

Trust & robustness

Built like a tool you'd run unattended overnight.

Local-first by default

Scenarios, runs, screenshots — all on your machine. No cloud, no account, no telemetry unless you opt in.

Sandboxed screenshots

Custom protocol handler serves screenshot assets only from your data folder. Even a compromised renderer can't read other files.

Engine supervisor

If the engine crashes, the supervisor respawns it with backoff (max 3 attempts). Dangling runs auto-error so you never see phantom 'running' rows.

Update checks

App polls a static JSON endpoint on launch — a subtle StatusBar pill tells you when a new version is available, with required-update support.

04

Power-user workflow

Day-2 polish for SQAs who use this constantly.

Cancel mid-run

Cancel button in Live Run stops the agent after the current step, no orphan processes left behind.

Search + filter

Every list (Scenarios, History) has search + tag/status filters. ⌘K command palette jumps between everything in two keystrokes.

Cost cap per run

Set a cents-per-run ceiling in Settings. The agent stops itself if a runaway scenario hits the cap — your LLM bill stays predictable.

Retention purge

Delete runs older than N days with one click. Screenshots go with them — keep your data folder lean.

See it work on your laptop in a minute

Download Orbis, paste an LLM key, describe your first test. The wizard walks you through everything else.