mlx-serve
Native LLM inference server for Apple Silicon. OpenAI + Anthropic API compatible. No Python. Includes MLX Core macOS app with chat, agent m…
About mlx-serve
OpenAI- and Anthropic-compatible local inference for Apple Silicon — MLX and GGUF — faster than LM Studio on the same file. No Python. No cloud. No Electron.
mlx-serve is a native Zig server that runs any LLM on Apple Silicon — MLX-format models and every GGUF on HuggingFace (Qwen, Llama, Mistral, Gemma, DeepSeek V4 Flash, thousands more). It exposes OpenAI-compatible and Anthropic-compatible HTTP APIs out of the box, so the same http://localhost:11234 works with Claude Code, the OpenAI SDK, Continue, Cursor, Open WebUI, and anything else that speaks one of those wires. Ships with MLX Core, a macOS menu-bar app with chat, agent mode, MCP tool call…
Short names, org/repo HuggingFace ids, and name:tag all work. And because mlx-serve speaks the Ollama API (/api/chat, /api/generate, /api/tags, /api/embed, /api/pull, …) alongside OpenAI and Anthropic, your existing Ollama-connected tools — Raycast, Obsidian, Enchanted, Open WebUI, ollama-python/js — work unchanged: point them at http://localhost:11234 and keep your workflow, on a faster engine.
mlx-serve is an open-source project written primarily in Zig, with 199 stars on GitHub. It was last updated in June 2026.
brew install --cask mlx-core # GUI menu bar appmlx-serve vs. the alternatives
All agent infrastructure →| Agent | Stars | Pricing | ||
|---|---|---|---|---|
| mlx-serve | 199 | Zig | MIT | Open source |
| daytona | 72k | — | — | Open source |
| mem0 | 60k | Python | Apache-2.0 | Open source |
| cua | 19k | HTML | MIT | Open source |
| gateway | 12k | TypeScript | MIT | Open source |
| steel-browser | 7.3k | TypeScript | Apache-2.0 | Open source |
