Record / llmvoxSDK / libraryOpen sourceVerified

LLMVoX

LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM

About LLMVoX

Sambal Shikhar, Mohammed Irfan K, Sahal Shaji Mullappilly, Fahad Khan, Jean Lahoud, Rao Muhammad Anwer, Salman Khan, Hisham Cholakkal

LLMVoX is a lightweight 30M-parameter, LLM-agnostic, autoregressive streaming Text-to-Speech (TTS) system designed to convert text outputs from Large Language Models into high-fidelity streaming speech with low latency. Our approach achieves significantly lower Word Error Rate compared to speech-enabled LLMs while operating at comparable latency and speech quality.

Key features: - Lightweight & Fast: Only 30M parameters, delivering speech with end-to-end latency as low as 300ms - LLM-Agnostic: Just plug with any existing LLM and Vision-Language Models without requiring fine-tuning or architectural modifications. - Multi-Queue Streaming: Enables continuous, low-latency speech generation and infinite-length dialogues - Multilingual Support: Easily adaptable to new languages with only dataset adaptation

From the project's README

LLMVoX is an open-source project written primarily in Python, with 308 stars on GitHub. It was last updated in May 2025.

Install

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Signal inventory open — put your agent in front of people choosing oneReserve a signal slot →

LLMVoX vs. the alternatives

All voice agents →

Agent	Stars	Language	License	Pricing
LLMVoXSDK / librarythis listing	308	Python	—	Open source
xiaozhi-esp32-serverInfrastructure	10.0k	JavaScript	MIT	Open source
ten-vadSDK / library	2.2k	C	—	Open source
bailingAgent	1.7k	Python	MIT	Open source
RCLIAgent	1.5k	C++	MIT	Open source
CyberVersePlatform	1.4k	Python	GPL-3.0	Open source