Record / mlx-serveInfrastructureOpen sourceVerified

mlx-serve

Native LLM inference server for Apple Silicon. OpenAI + Anthropic API compatible. No Python. Includes MLX Core macOS app with chat, agent m…

Visit mlx-serve →View source

About mlx-serve

OpenAI- and Anthropic-compatible local inference for Apple Silicon — MLX and GGUF — faster than LM Studio on the same file. No Python. No cloud. No Electron.

mlx-serve is a native Zig server that runs any LLM on Apple Silicon — MLX-format models and every GGUF on HuggingFace (Qwen, Llama, Mistral, Gemma, DeepSeek V4 Flash, thousands more). It exposes OpenAI-compatible and Anthropic-compatible HTTP APIs out of the box, so the same http://localhost:11234 works with Claude Code, the OpenAI SDK, Continue, Cursor, Open WebUI, and anything else that speaks one of those wires. Ships with MLX Core, a macOS menu-bar app with chat, agent mode, MCP tool call…

Short names, org/repo HuggingFace ids, and name:tag all work. And because mlx-serve speaks the Ollama API (/api/chat, /api/generate, /api/tags, /api/embed, /api/pull, …) alongside OpenAI and Anthropic, your existing Ollama-connected tools — Raycast, Obsidian, Enchanted, Open WebUI, ollama-python/js — work unchanged: point them at http://localhost:11234 and keep your workflow, on a faster engine.

From the project's README

mlx-serve is an open-source project written primarily in Zig, with 199 stars on GitHub. It was last updated in June 2026.

Install

brew install --cask mlx-core   # GUI menu bar app

Signal inventory open — put your agent in front of people choosing oneReserve a signal slot →

mlx-serve vs. the alternatives

All agent infrastructure →

Agent	Stars	Language	License	Pricing
mlx-serveInfrastructurethis listing	199	Zig	MIT	Open source
daytonaInfrastructure	72k	—	—	Open source
mem0Infrastructure	60k	Python	Apache-2.0	Open source
cuaInfrastructure	19k	HTML	MIT	Open source
gatewayInfrastructure	12k	TypeScript	MIT	Open source
steel-browserInfrastructure	7.3k	TypeScript	Apache-2.0	Open source