Every feature, explained.
Orbis QA is opinionated about which features matter. Here's the full list — what each does, and how it shows up in your day.
Core
What the agent does every run.
Plain-English scenarios
Describe what to test in two sentences. The planner LLM turns it into clicks, types, and assertions at run time.
Recovery sub-loop
When a step fails, a dedicated diagnosis LLM call decides whether to retry, skip the failing assertion, or abort — never a dumb 'retry 3 times'.
Live progress stream
Every step streams as it happens. Watch the agent reason, act, and assert with full screenshots in the Live Run view.
Evidence trail
Screenshot + ARIA snapshot + network slice captured at every step, stored on disk for as long as you want. Reports are self-contained HTML.
Suites
Group scenarios into named batches. Run them sequentially with one click; aggregate pass/fail at the end. Set continue-on-failure per suite.
Multi-role sessions
Switch between admin and end-user mid-scenario. The session manager captures cookies + storage and restores per role — without re-login every time.
Analytics & observability
Know what's happening and why.
Run-vs-run compare
Pick any two runs from history. Side-by-side step diff with drift highlighting — see exactly what changed.
Flake detection
Per-scenario pass-rate over the last 20 runs, with a flake badge when consistency drops below 100%.
Cost ledger
Today, this week, this month — broken down by model. Know exactly what your QA agent is spending on your LLM key.
Activity log
In-memory ring of recent engine events. Always on, never persisted, never sent off-device. Privacy-safe even with telemetry disabled.
Trust & robustness
Built like a tool you'd run unattended overnight.
Local-first by default
Scenarios, runs, screenshots — all on your machine. No cloud, no account, no telemetry unless you opt in.
Sandboxed screenshots
Custom protocol handler serves screenshot assets only from your data folder. Even a compromised renderer can't read other files.
Engine supervisor
If the engine crashes, the supervisor respawns it with backoff (max 3 attempts). Dangling runs auto-error so you never see phantom 'running' rows.
Update checks
App polls a static JSON endpoint on launch — a subtle StatusBar pill tells you when a new version is available, with required-update support.
Power-user workflow
Day-2 polish for SQAs who use this constantly.
Cancel mid-run
Cancel button in Live Run stops the agent after the current step, no orphan processes left behind.
Search + filter
Every list (Scenarios, History) has search + tag/status filters. ⌘K command palette jumps between everything in two keystrokes.
Cost cap per run
Set a cents-per-run ceiling in Settings. The agent stops itself if a runaway scenario hits the cap — your LLM bill stays predictable.
Retention purge
Delete runs older than N days with one click. Screenshots go with them — keep your data folder lean.
See it work on your laptop in a minute
Download Orbis, paste an LLM key, describe your first test. The wizard walks you through everything else.