Build Your Own AI Voice Agent for Free with Pipecat
Do you know you can build a real-time AI voice agent without paying for expensive voice agent platforms? Pipecat is an open-source Python framework for building real-time voice and multimodal AI agents. Instead of manually connecting speech-to-text, AI models, and voice generation services, Pipecat orchestrates everything through a low-latency pipeline designed for natural conversations. Whether you're building an AI receptionist, appointment booking assistant, customer support agent, or phone-based AI assistant, Pipecat provides the tools needed to get started quickly.
Key Features
- Completely open source
- Real-time voice conversations
- Supports OpenAI, Gemini, Claude, and local LLMs
- Works with multiple speech-to-text providers
- Supports various text-to-speech engines
- WebRTC support for low-latency communication
- Multi-agent workflows
- Telephony integrations
- Highly customizable pipelines
- Production-ready architecture
What Can You Build?
Pipecat can be used to create:
- AI Receptionists
- Customer Support Agents
- Appointment Booking Assistants
- Lead Qualification Agents
- Recruitment Assistants
- Internal Company Assistants
- AI Phone Agents
- Voice-Based SaaS Products
- Multimodal Voice + Video Applications
How Pipecat Works
Pipecat connects multiple AI services into a real-time conversational pipeline.
Voice Pipeline
User Speaks
↓
Speech-to-Text (STT)
↓
Large Language Model (LLM)
↓
Text-to-Speech (TTS)
↓
Voice Response
A typical interaction follows this flow:
- User speaks through a browser, mobile app, or phone call.
- Speech-to-text converts audio into text.
- The AI model processes the request.
- Text-to-speech converts the response into audio.
- The response is streamed back to the user.
Pipecat manages this entire pipeline automatically while maintaining low latency and natural conversations.
Prerequisites
Before creating your first voice agent, install the following:
Python
Pipecat requires Python 3.11 or newer.
python --version
UV Package Manager
Install UV:
pip install uv
Or:
curl -LsSf https://astral.sh/uv/install.sh | sh
Step 1 – Install Pipecat CLI
Pipecat now provides a CLI that can generate complete voice agent projects automatically.
Install the CLI:
uv tool install pipecat-ai-cli
Verify installation:
pipecat --version
Step 2 – Create a New Voice Agent
Launch the project wizard:
pipecat init
Or generate the official quickstart project:
pipecat init quickstart
The wizard will guide you through selecting:
Platform
- Web Application
- Mobile Application
- Phone Agent
Speech-to-Text Provider
Examples:
- Deepgram
- Speechmatics
- Gladia
AI Model
Examples:
- OpenAI
- Gemini
- Claude
- Local LLMs
Text-to-Speech Provider
Examples:
- Cartesia
- ElevenLabs
- LMNT
Pipecat automatically generates the project structure and starter code.
Step 3 – Configure API Keys
Create your environment file:
cp env.example .env
Add your API keys:
OPENAI_API_KEY=your_key
DEEPGRAM_API_KEY=your_key
CARTESIA_API_KEY=your_key
The official Quickstart commonly uses:
- OpenAI
- Deepgram
- Cartesia
You can replace these with other supported providers.
Step 4 – Install Project Dependencies
Navigate into your project folder:
cd my-pipecat-agent
Install dependencies:
uv sync
This installs all required packages for your voice agent.
Step 5 – Run Your Voice Agent
Start the application:
uv run bot.py
Once started, open the local application in your browser and connect to your AI assistant.
Your voice agent is now ready for testing.
Supported AI Providers
Speech-to-Text
- Deepgram
- OpenAI STT
- Speechmatics
- Gladia
Large Language Models
- OpenAI
- Gemini
- Claude
- Local Models
Text-to-Speech
- Cartesia
- ElevenLabs
- LMNT
- Deepgram TTS
Developers can mix and match providers depending on their requirements.
Advanced Features
Multi-Agent Workflows
Create specialized agents that can hand conversations to one another.
Examples:
- Reception Agent
- Sales Agent
- Support Agent
Structured Conversation Flows
Build guided workflows such as:
- Appointment Booking
- Customer Qualification
- Customer Support
- Lead Collection
Telephony Integrations
Connect AI agents directly to:
- Twilio
- SIP
- PSTN Networks
- Phone Systems
This allows AI agents to answer and place phone calls automatically.
Example Business Use Cases
AI Receptionist
Answer incoming calls and collect customer information.
Appointment Booking Assistant
Schedule appointments automatically.
Lead Qualification Agent
Ask qualifying questions before transferring prospects to a sales representative.
Customer Support Agent
Handle frequently asked questions 24/7.
Recruitment Assistant
Conduct initial candidate screening interviews.
Internal Company Assistant
Provide employees with instant access to company information.
Phone-Based AI Agent
Handle inbound and outbound calls for businesses.
Deployment Options
After testing locally, you can deploy your Pipecat application to:
- Pipecat Cloud
- AWS
- Fly.io
- Modal
- Cerebrium
- Dedicated Servers
- Self-Hosted Infrastructure
This makes Pipecat suitable for both small projects and enterprise-scale deployments.
Why Use Pipecat?
Many voice-agent platforms charge monthly fees and limit customization.
Pipecat gives developers:
- Full control over the conversation pipeline
- Freedom to choose AI providers
- Open-source flexibility
- Production scalability
- Telephony support
- Multi-provider integrations
- Real-time low-latency conversations
Because it is open source, businesses can create highly customized voice agents without being locked into a single vendor.