I built an AI agent that controls any Mac app

TL;DR: mac-use is an open-source MCP server that lets Claude Code control any macOS app — clicking buttons, filling forms, reading values, navigating menus — through Apple’s Accessibility API. No screenshots. No pixel coordinates. I built it to automate my Swiss tax return. It works for any desktop app you’re tired of clicking through manually.

Filing taxes in Switzerland means eTax — a desktop app from 2003 that looks like it. Dozens of tabs. Hundreds of fields. Dropdowns that require scrolling through every Swiss municipality. Every year, same ritual: open the app, manually type numbers from documents I already have digitally, click through the same sequences, lose an afternoon to a GUI.

Switzerland isn’t special here. The US has TurboTax. The UK has HMRC’s desktop tools. France has the impots.gouv client. Italy has Entratel. Every country with a tax authority has some legacy desktop app that hasn’t been meaningfully updated since the mid-2000s. The forms change yearly. The software doesn’t.

This year I automated the entire thing with mac-use. It joins the stack of AI skills that let a 5-person team operate without departments.

Key takeaways:

mac-use connects Claude Code to any macOS application through Apple’s native Accessibility API
It reads UI elements, clicks buttons, fills forms, navigates menus — no vision models, no screenshots
Desktop apps that can’t be scripted (tax software, legacy enterprise tools, government forms) become automatable
Install: pip install mac-use — one line, zero configuration beyond macOS accessibility permissions
Open source: works with any app that renders standard macOS UI elements

The problem nobody talks about

Every founder has at least one desktop app that eats hours and can’t be automated.

Banking portals. Government submission tools. The accounting package your bookkeeper insists on. Insurance claim forms. Customs declaration apps. They all share the same design philosophy: information goes in through a human clicking things. There’s no API. No CLI. No export button that gives you what you actually need.

The web got APIs. Mobile got deep links. Desktop apps got left behind. Your data goes in through the front door — the GUI — or it doesn’t go in at all.

The standard solutions:

Approach	Problem
Screenshot-based AI (GPT-4V, etc.)	Slow, expensive, resolution-dependent, breaks at different screen sizes
AppleScript / Automator	Requires deep macOS scripting knowledge, brittle, hard to debug
Keyboard Maestro / similar	Record-and-replay — breaks the moment a button moves
RPA tools (UiPath, etc.)	Enterprise pricing, enterprise complexity, enterprise sales cycle
Manual entry	Works. Costs you hours you don’t have.

Every option is either fragile, expensive, or requires expertise that has nothing to do with the actual task.

What mac-use does differently

mac-use is an MCP server — a bridge between Claude Code and your Mac’s Accessibility API. Instead of taking screenshots and guessing where buttons are, it reads the actual UI element tree. The same tree that VoiceOver uses to help blind users navigate macOS.

This means Claude Code can:

See every button, text field, dropdown, tab, and menu item — by name, role, and state
Click elements by what they’re called, not where they are on screen
Read values from any UI element — text fields, labels, status indicators
Type into specific fields without needing to tab through an entire form
Fill multiple form fields in a single operation
Navigate menus by path: File > Export > PDF
Press keyboard shortcuts — Cmd+S, Cmd+P, whatever the app supports

No vision model in the loop. No per-action API calls to GPT-4V at $0.01/image. Text-only interactions with the accessibility layer that Apple built into every Mac app.

The difference matters:

	Screenshot-based	mac-use (Accessibility API)
Speed	3-5s per action (vision model)	Instant (direct API call)
Accuracy	Pixel-dependent, breaks with scaling	Element-based, scale-independent
Cost	$0.01-0.03 per screenshot	Zero marginal cost
Resolution	Fails on Retina vs. non-Retina	Resolution-irrelevant
Reliability	Breaks when UI shifts 10px	Works if the element exists

How I automated eTax

Here’s what the tax return workflow actually looks like:

Claude Code reads my income documents (PDFs, CSVs from my accounting software)
It opens eTax and navigates to the first section
For each section — income, deductions, assets, liabilities — it identifies the form fields by name, fills them with the correct values, and moves to the next tab
It handles the dropdowns (municipality selection, canton codes, profession categories) by reading the available options and selecting the right one
When it encounters a field it can’t fill with certainty, it stops and asks

The whole thing runs in Claude Code. I describe what I want: “Fill my Swiss cantonal tax return using the data from these documents.” Claude Code orchestrates the rest — reading documents, controlling eTax through mac-use, validating as it goes.

Total time: under 10 minutes for what used to take an afternoon. And the accuracy is higher because there’s no manual transcription — numbers go straight from source documents to form fields.

Beyond tax returns

Tax software was the trigger. But once you have an AI agent that can control any desktop app, the use cases multiply:

Data extraction from locked apps. That legacy CRM your company uses? The one with no export feature? mac-use can read every field, every table, every record — and dump it into whatever format you need. No screen scraping. No OCR on screenshots. Direct element reading.

Cross-application workflows. Copy data from your accounting software into your banking portal. Transfer entries from one system to another. The kind of work that an assistant would do — opening two apps, reading from one, typing into the other — except the assistant is Claude Code and it doesn’t make transcription errors.

Automated form submission. Insurance claims. Government applications. Permit renewals. Customs declarations. Any form-heavy desktop application where you’re typing the same categories of information from documents you already have digitally.

Testing and QA. If you build macOS apps, mac-use gives your AI agent the ability to navigate your app, check element states, verify form validation, and report what it finds. Accessibility compliance testing becomes trivial — if mac-use can’t find an element, neither can VoiceOver.

Setup

One line:

pip install mac-use

Add to your Claude Code config (~/.claude.json):

{
  "mcpServers": {
    "mac-use": {
      "command": "mac-use"
    }
  }
}

Grant accessibility permissions to your terminal (System Settings > Privacy & Security > Accessibility). That’s it.

Then tell Claude Code what you want done: “Open eTax and fill in my income from this PDF.” It handles the rest — identifying windows, reading UI elements, clicking, typing, navigating.

Requirements: macOS Ventura or later. Python 3.10+. The target app needs to render standard macOS UI elements (most apps do — even Electron apps expose accessibility trees).

How it works under the hood

mac-use talks to macOS through osascript and System Events — the same mechanism that powers VoiceOver, Switch Control, and every other accessibility feature on the Mac.

When Claude Code calls get_ui_elements(), mac-use returns the full element tree: button names, text field values, dropdown options, tab labels, menu structures. Claude Code uses this tree to decide what to interact with next. It’s not guessing from pixels. It’s reading the actual interface description that the app provides to the operating system.

This is why it works reliably. Apple requires apps to expose accessibility information. It’s been a platform requirement for years. mac-use just uses that information for automation instead of disability access. Same API, different purpose.

FAQ

Does it work with any macOS app?

Any app that exposes standard accessibility elements — which is most of them. Native macOS apps (Swift/AppKit/SwiftUI) work best. Electron apps work. Java apps work. Some heavily custom-rendered UIs (games, some creative tools) may not expose element trees.

How is this different from Keyboard Maestro or Automator?

Those tools record and replay fixed sequences. mac-use gives an AI agent dynamic understanding of the interface. It reads the current state, decides what to do next, handles variations, and recovers from unexpected states. It’s the difference between a script and an agent.

Is it safe for sensitive data like tax returns?

Everything runs locally. mac-use is an MCP server on your machine — it talks to your apps through your OS. No data leaves your Mac (beyond what Claude Code sends to the API for reasoning). Your tax data stays local.

Can I use it without Claude Code?

mac-use is an MCP server, so it works with any MCP-compatible client. Claude Code is the most natural fit because it can orchestrate multi-step workflows, but any client that speaks MCP can use it.

What about Windows or Linux?

Coming. Windows has UI Automation and Linux has AT-SPI — same concept, different implementations. Send me a message if you want it ASAP.

Desktop apps were the last holdout — the software that couldn’t be automated because it was never designed to be. mac-use changes that. Clone it, point it at whatever app wastes your time, and let Claude Code do the clicking.

I built an AI agent that controls any Mac app

The problem nobody talks about

What mac-use does differently

How I automated eTax

Beyond tax returns

Setup

How it works under the hood

FAQ

Does it work with any macOS app?

How is this different from Keyboard Maestro or Automator?

Is it safe for sensitive data like tax returns?

Can I use it without Claude Code?

What about Windows or Linux?

Related posts

Self-hosted home security for multiple properties, set up entirely by Claude Code

Your free users are not your customers

What AI can't do for you

The problem nobody talks about

What mac-use does differently

How I automated eTax

Beyond tax returns

Setup

How it works under the hood

FAQ

Does it work with any macOS app?

How is this different from Keyboard Maestro or Automator?

Is it safe for sensitive data like tax returns?

Can I use it without Claude Code?

What about Windows or Linux?

Related posts

Self-hosted home security for multiple properties, set up entirely by Claude Code

Your free users are not your customers

What AI can't do for you

The stuff nobody tells founders