Building with a TTS API: From First Request to Production

Why Use a TTS API?

If your app needs to speak — narrate content, read notifications, generate podcasts, or power a voice assistant — you need a TTS API. Building your own synthesis engine isn't practical. Using a managed API gives you production-quality voices with a single HTTP call.

Your First Request

curl -X POST https://aitts.theproductivepixel.com/api/v1/tts \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from the API.",
    "voice_id": "google:en-US-Chirp3HD-Kore",
    "output_format": "mp3"
  }'

The response includes a job ID and status. Poll the status endpoint or use webhooks to know when audio is ready.

Choosing a Delivery Mode

Async (default): Submit text, get a job ID, poll or wait for webhook. Best for batch processing and background generation.

Streaming: Get a stream_url in the response. Fetch it to receive audio bytes in real-time. Best for interactive UIs where latency matters.

{
  "text": "Stream this sentence.",
  "voice_id": "google:en-US-Chirp3HD-Kore",
  "delivery_mode": "stream",
  "output_format": "mp3"
}

Voice Selection

Browse available voices programmatically:

curl https://aitts.theproductivepixel.com/api/v1/voices

Each voice has a provider, language, gender, and supported formats. Use the voice_id field in generation requests.

Handling the Audio

Completed jobs expose an audio_endpoint — a stable URL that returns the audio file with proper content headers. Use this for playback, download, or embedding.

For temporary access, audio_url provides a time-limited signed URL (24h expiry).

Webhooks for Production

Instead of polling, configure a webhook URL on your account. You'll receive a POST when jobs complete:

{
  "event": "job.completed",
  "job_id": "abc123",
  "audio_endpoint": "/api/v1/tts/abc123/audio",
  "chars_charged": 42
}

Building with a TTS API: From First Request to Production

Why Use a TTS API?

Your First Request

Choosing a Delivery Mode

Voice Selection

Handling the Audio

Webhooks for Production

Next Steps

Related Posts

Streaming Text-to-Speech: Hear Audio as It Generates

Connecting AI Agents to Text-to-Speech with MCP