Reliable transcript extraction over a simple REST endpoint. We handle YouTube's bot detection, IP rotation, and captionless videos (Whisper fallback) — you just call an API that works.
Open-source libraries get your servers blocked. With us, YouTube's bot detection becomes our problem, not yours — and failed requests are never charged.
Videos with no captions fall back to Whisper speech-to-text automatically. One endpoint, every video.
Timestamped segments with language detection. No XML scraping, no player-response parsing.
curl "https://ytranscript.com/api/v1/transcript?videoId=dQw4w9WgXcQ" \ -H "Authorization: Bearer yk_live_YOUR_KEY"
{
"video_id": "dQw4w9WgXcQ",
"lang": "en",
"whisper": false,
"cached": false,
"units_charged": 1,
"segments": [
{ "text": "We're no strangers to love", "offset": 18800, "duration": 3600 },
...
]
}| Parameter | Type | Description |
|---|---|---|
| videoId | string | 11-character YouTube video ID |
| url | string | Full YouTube URL (alternative to videoId) |
| lang | string, optional | Preferred language code (e.g. en, es). Defaults to the video's source language. |
Units: caption transcripts cost 1 unit. Whisper transcriptions (captionless videos) cost 15 units. Failed requests cost nothing.
Headers: every response includes X-Quota-Limit, X-Quota-Used, and X-RateLimit-Remaining.
Errors: 401 invalid key · 402 quota exhausted · 404 no transcript available · 429 rate limited · 500 fetch failed (not charged).
Reliability: extraction is an arms race with YouTube — success rates are high but not absolute, which is why failed requests never consume quota. We run multi-strategy fallbacks (edge, direct, residential, Whisper) so your code only ever sees one endpoint.
Checking your account…