midscene
AI-powered, vision-driven UI automation for every platform.
About midscene
Open-source, vision-driven UI testing — write tests in natural language, automate any platform.
Most UI automation — including AI tools that read the DOM or the accessibility tree — depends on page structure. That structure is fragile and incomplete: selectors break on every refactor, elements without semantic markup (icon-only buttons, custom controls, ) are invisible to it, native apps and cross-origin iframes are out of reach, and it cannot tell whether something actually looks right. Midscene works from the screenshot alone, and you describe each step in natural language:
Midscene is built for UI testing first, but the same vision-driven engine handles any UI automation task.
midscene is an open-source project written primarily in TypeScript, with 14k stars on GitHub. It was last updated in July 2026.
midscene vs. the alternatives
All browser & computer use →| Agent | Stars | Pricing | ||
|---|---|---|---|---|
| midscene | 14k | TypeScript | MIT | Open source |
| UI-TARS-desktop | 38k | TypeScript | Apache-2.0 | Open source |
| skyvern | 22k | Python | AGPL-3.0 | Open source |
| page-agent | 22k | TypeScript | MIT | Open source |
| nanobrowser | 13k | TypeScript | Apache-2.0 | Open source |
| Agent-S | 12k | Python | Apache-2.0 | Open source |
