I built an AI agent that controls any Mac app
mac-use lets Claude Code control any macOS app through accessibility APIs — no screenshots, no pixel coordinates. I used it to automate my Swiss tax return.
By Alex Diaz · Updated March 30, 2026
TL;DR: mac-use is an open-source MCP server that lets Claude Code control any macOS app — clicking buttons, filling forms, reading values, navigating menus — through Apple’s Accessibility API. No screenshots. No pixel coordinates. I built it to automate my Swiss tax return. It works for any desktop app you’re tired of clicking through manually.
Filing taxes in Switzerland means eTax — a desktop app from 2003 that looks like it. Dozens of tabs. Hundreds of fields. Dropdowns that require scrolling through every Swiss municipality. Every year, same ritual: open the app, manually type numbers from documents I already have digitally, click through the same sequences, lose an afternoon to a GUI.
Switzerland isn’t special here. The US has TurboTax. The UK has HMRC’s desktop tools. France has the impots.gouv client. Italy has Entratel. Every country with a tax authority has some legacy desktop app that hasn’t been meaningfully updated since the mid-2000s. The forms change yearly. The software doesn’t.
This year I automated the entire thing with mac-use. It joins the stack of AI skills that let a 5-person team operate without departments.
Key takeaways:
- mac-use connects Claude Code to any macOS application through Apple’s native Accessibility API
- It reads UI elements, clicks buttons, fills forms, navigates menus — no vision models, no screenshots
- Desktop apps that can’t be scripted (tax software, legacy enterprise tools, government forms) become automatable
- Install:
pip install mac-use— one line, zero configuration beyond macOS accessibility permissions - Open source: works with any app that renders standard macOS UI elements
The problem nobody talks about
Every founder has at least one desktop app that eats hours and can’t be automated.
Banking portals. Government submission tools. The accounting package your bookkeeper insists on. Insurance claim forms. Customs declaration apps. They all share the same design philosophy: information goes in through a human clicking things. There’s no API. No CLI. No export button that gives you what you actually need.
The web got APIs. Mobile got deep links. Desktop apps got left behind. Your data goes in through the front door — the GUI — or it doesn’t go in at all.
The standard solutions:
| Approach | Problem |
|---|---|
| Screenshot-based AI (GPT-4V, etc.) | Slow, expensive, resolution-dependent, breaks at different screen sizes |
| AppleScript / Automator | Requires deep macOS scripting knowledge, brittle, hard to debug |
| Keyboard Maestro / similar | Record-and-replay — breaks the moment a button moves |
| RPA tools (UiPath, etc.) | Enterprise pricing, enterprise complexity, enterprise sales cycle |
| Manual entry | Works. Costs you hours you don’t have. |
Every option is either fragile, expensive, or requires expertise that has nothing to do with the actual task.
What mac-use does differently
mac-use is an MCP server — a bridge between Claude Code and your Mac’s Accessibility API. Instead of taking screenshots and guessing where buttons are, it reads the actual UI element tree. The same tree that VoiceOver uses to help blind users navigate macOS.
This means Claude Code can:
- See every button, text field, dropdown, tab, and menu item — by name, role, and state
- Click elements by what they’re called, not where they are on screen
- Read values from any UI element — text fields, labels, status indicators
- Type into specific fields without needing to tab through an entire form
- Fill multiple form fields in a single operation
- Navigate menus by path: File > Export > PDF
- Press keyboard shortcuts — Cmd+S, Cmd+P, whatever the app supports
No vision model in the loop. No per-action API calls to GPT-4V at $0.01/image. Text-only interactions with the accessibility layer that Apple built into every Mac app.
The difference matters:
| Screenshot-based | mac-use (Accessibility API) | |
|---|---|---|
| Speed | 3-5s per action (vision model) | Instant (direct API call) |
| Accuracy | Pixel-dependent, breaks with scaling | Element-based, scale-independent |
| Cost | $0.01-0.03 per screenshot | Zero marginal cost |
| Resolution | Fails on Retina vs. non-Retina | Resolution-irrelevant |
| Reliability | Breaks when UI shifts 10px | Works if the element exists |
How I automated eTax
Here’s what the tax return workflow actually looks like:
- Claude Code reads my income documents (PDFs, CSVs from my accounting software)
- It opens eTax and navigates to the first section
- For each section — income, deductions, assets, liabilities — it identifies the form fields by name, fills them with the correct values, and moves to the next tab
- It handles the dropdowns (municipality selection, canton codes, profession categories) by reading the available options and selecting the right one
- When it encounters a field it can’t fill with certainty, it stops and asks
The whole thing runs in Claude Code. I describe what I want: “Fill my Swiss cantonal tax return using the data from these documents.” Claude Code orchestrates the rest — reading documents, controlling eTax through mac-use, validating as it goes.
Total time: under 10 minutes for what used to take an afternoon. And the accuracy is higher because there’s no manual transcription — numbers go straight from source documents to form fields.
Beyond tax returns
Tax software was the trigger. But once you have an AI agent that can control any desktop app, the use cases multiply:
Data extraction from locked apps. That legacy CRM your company uses? The one with no export feature? mac-use can read every field, every table, every record — and dump it into whatever format you need. No screen scraping. No OCR on screenshots. Direct element reading.
Cross-application workflows. Copy data from your accounting software into your banking portal. Transfer entries from one system to another. The kind of work that an assistant would do — opening two apps, reading from one, typing into the other — except the assistant is Claude Code and it doesn’t make transcription errors.
Automated form submission. Insurance claims. Government applications. Permit renewals. Customs declarations. Any form-heavy desktop application where you’re typing the same categories of information from documents you already have digitally.
Testing and QA. If you build macOS apps, mac-use gives your AI agent the ability to navigate your app, check element states, verify form validation, and report what it finds. Accessibility compliance testing becomes trivial — if mac-use can’t find an element, neither can VoiceOver.
Setup
One line:
pip install mac-use
Add to your Claude Code config (~/.claude.json):
{
"mcpServers": {
"mac-use": {
"command": "mac-use"
}
}
}
Grant accessibility permissions to your terminal (System Settings > Privacy & Security > Accessibility). That’s it.
Then tell Claude Code what you want done: “Open eTax and fill in my income from this PDF.” It handles the rest — identifying windows, reading UI elements, clicking, typing, navigating.
Requirements: macOS Ventura or later. Python 3.10+. The target app needs to render standard macOS UI elements (most apps do — even Electron apps expose accessibility trees).
How it works under the hood
mac-use talks to macOS through osascript and System Events — the same mechanism that powers VoiceOver, Switch Control, and every other accessibility feature on the Mac.
When Claude Code calls get_ui_elements(), mac-use returns the full element tree: button names, text field values, dropdown options, tab labels, menu structures. Claude Code uses this tree to decide what to interact with next. It’s not guessing from pixels. It’s reading the actual interface description that the app provides to the operating system.
This is why it works reliably. Apple requires apps to expose accessibility information. It’s been a platform requirement for years. mac-use just uses that information for automation instead of disability access. Same API, different purpose.
FAQ
Does it work with any macOS app?
Any app that exposes standard accessibility elements — which is most of them. Native macOS apps (Swift/AppKit/SwiftUI) work best. Electron apps work. Java apps work. Some heavily custom-rendered UIs (games, some creative tools) may not expose element trees.
How is this different from Keyboard Maestro or Automator?
Those tools record and replay fixed sequences. mac-use gives an AI agent dynamic understanding of the interface. It reads the current state, decides what to do next, handles variations, and recovers from unexpected states. It’s the difference between a script and an agent.
Is it safe for sensitive data like tax returns?
Everything runs locally. mac-use is an MCP server on your machine — it talks to your apps through your OS. No data leaves your Mac (beyond what Claude Code sends to the API for reasoning). Your tax data stays local.
Can I use it without Claude Code?
mac-use is an MCP server, so it works with any MCP-compatible client. Claude Code is the most natural fit because it can orchestrate multi-step workflows, but any client that speaks MCP can use it.
What about Windows or Linux?
Coming. Windows has UI Automation and Linux has AT-SPI — same concept, different implementations. Send me a message if you want it ASAP.
Desktop apps were the last holdout — the software that couldn’t be automated because it was never designed to be. mac-use changes that. Clone it, point it at whatever app wastes your time, and let Claude Code do the clicking.
Related posts
Self-hosted home security for multiple properties, set up entirely by Claude Code
I told Claude Code what I wanted. It SSH'd into my Home Assistant, installed Frigate, configured cameras, set up Discord alerts, and deployed health monitoring. The entire setup - from factory reset to working security system - without me touching a config file.
Your free users are not your customers
Less than 4% of our stores generate nearly 40% of our revenue. Thousands of stores on our free and basic tiers consume most of our support and will never upgrade. Here's what we learned.
What AI can't do for you
AI handles output. Humans handle direction. The more execution becomes free, the more valuable the things that can't be automated become. Here's the list.