No cloud, no telemetry by default. Scenarios, runs, screenshots — all live in your data folder, only on your laptop.
Anthropic, OpenAI, Gemini, DeepSeek, xAI, Mistral, OpenRouter, Ollama, or any OpenAI-compatible endpoint. The key is encrypted in your OS keychain — Orbis never sees it.
Custom protocol sandbox for screenshots, locked DevTools in production, error-message scrubbing, and integrity-checked asar bundle.
Four steps. No selectors.
You describe what to test. The agent observes, reasons, acts, and asserts in a loop until your scenario is done — or it tells you why it couldn't be.
- 1Perceive
Snapshot the page — ARIA tree, screenshot, recent network slice.
- 2Reason
The planner LLM picks the next action + assertions, with the scenario prose in context.
- 3Act
Click, type, scroll, navigate — driven through the real browser, not a headless simulation.
- 4Assert + recover
Check assertions. If something failed, the recoverer LLM diagnoses + decides retry/skip/abort.
Everything an SQA needs. Nothing they don't.
Plain-English scenarios
Describe the user journey in two sentences. The planner LLM figures out clicks, types, and assertions step-by-step.
Reasoning recovery
When a step fails, a dedicated diagnosis call decides whether to retry, skip, or abort — not a brittle 'retry 3 times'.
Suites
Group scenarios into named batches. Run nightly smoke tests as one command; aggregate pass/fail at the end.
Run-vs-run compare
Side-by-side diff with drift highlighting. Spot what actually changed between yesterday's run and today's.
Live progress + cost cap
Watch every step stream live with screenshots. Per-run cents cap stops a runaway scenario from draining your key.
Flake detection
Pass-rate per scenario over the last 20 runs. Real flake telemetry without setting up dashboards.
Playwright + Cypress are great. If you have engineers to maintain them.
- Write a script. UI ships. Selectors break. Rewrite.
- Every new flow needs a new test, fixtures, and CI plumbing.
- Flakes get retried until they're 'green' enough.
- QA people who can't code wait for engineers.
- "Sign in, approve the pending subscription, verify status." → it runs.
- No selectors. UI redesigns don't break your scenarios.
- Recovery sub-loop diagnoses failures; flakes get flagged with %.
- SQAs ship scenarios on their own. Engineers stay focused.
Start running scenarios in under a minute.
Download Orbis QA, paste your LLM key (Gemini 2.0 Flash is a cheap fast default), and describe your first test. No account. No tracking.