Building with a TTS API: From First Request to Production
Why Use a TTS API?
If your app needs to speak — narrate content, read notifications, generate podcasts, or power a voice assistant — you need a TTS API. Building your own synthesis engine isn't practical. Using a managed API gives you production-quality voices with a single HTTP call.
Your First Request
curl -X POST https://aitts.theproductivepixel.com/api/v1/tts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello from the API.",
"voice_id": "google:en-US-Chirp3HD-Kore",
"output_format": "mp3"
}'
The response includes a job ID and status. Poll the status endpoint or use webhooks to know when audio is ready.
Choosing a Delivery Mode
Async (default): Submit text, get a job ID, poll or wait for webhook. Best for batch processing and background generation.
Streaming: Get a stream_url in the response. Fetch it to receive audio bytes in real-time. Best for interactive UIs where latency matters.
{
"text": "Stream this sentence.",
"voice_id": "google:en-US-Chirp3HD-Kore",
"delivery_mode": "stream",
"output_format": "mp3"
}
Voice Selection
Browse available voices programmatically:
curl https://aitts.theproductivepixel.com/api/v1/voices
Each voice has a provider, language, gender, and supported formats. Use the voice_id field in generation requests.
Handling the Audio
Completed jobs expose an audio_endpoint — a stable URL that returns the audio file with proper content headers. Use this for playback, download, or embedding.
For temporary access, audio_url provides a time-limited signed URL (24h expiry).
Webhooks for Production
Instead of polling, configure a webhook URL on your account. You'll receive a POST when jobs complete:
{
"event": "job.completed",
"job_id": "abc123",
"audio_endpoint": "/api/v1/tts/abc123/audio",
"chars_charged": 42
}