youtube-transcript-api icon indicating copy to clipboard operation
youtube-transcript-api copied to clipboard

no element found

Open timfong888 opened this issue 6 months ago • 28 comments

DO NOT DELETE THIS! Please take the time to fill this out properly. I am not able to help you if I do not know what you are executing and what error messages you are getting. If you are having problems with a specific video make sure to include the video id.

To Reproduce

Steps to reproduce the behavior:

youtube_transcript_api CHvUp1rynek 
no element found: line 1, column 0

What code / cli command are you executing?

For example: I am running

youtube_transcript_api CHvUp1rynek 

Which Python version are you using?

Python 3.12.10

Which version of youtube-transcript-api are you using?

youtube-transcript-api 1.0.3

Expected behavior

Describe what you expected to happen.

For example: I expected to receive the english transcript

Actual behaviour

Describe what is happening instead of the Expected behavior. Add error messages if there are any.

For example: Instead I received the following error message:

# ... error message ...

timfong888 avatar Jun 08 '25 09:06 timfong888

https://github.com/Kakulukian/youtube-transcript/issues/45#issuecomment-2953657921

timfong888 avatar Jun 08 '25 10:06 timfong888

this happened to me also but I avoided it entirely by passing the cookies file

muflone avatar Jun 08 '25 13:06 muflone

this happened to me also but I avoided it entirely by passing the cookies file

For me, with or without cookie file makes no difference

Troptrap avatar Jun 08 '25 14:06 Troptrap

I'm running into this as well.

@muflone Could you share more about your workaround?

ShawhinT avatar Jun 08 '25 15:06 ShawhinT

This is currently working for me for version 1.0.3:

youtube_transcript_api foA4Sl_xlMc --language it --cookies ./cookies.txt  --format text

muflone avatar Jun 08 '25 16:06 muflone

Running into the same issue ! I am using the SDK btw. I made sure EN manual transcripts were available then fetched them but stumbled into this error. My code didn't change for months+ but his have been happening for days.

maherbel avatar Jun 08 '25 16:06 maherbel

Also having this issue. I need around 5 retries to get a transcript now. It was working much better two weeks ago for the same videos.

It seems that YouTube raised its guards; it is also discussed here: #414

danrosenberg avatar Jun 08 '25 16:06 danrosenberg

Also having this issue. I need around 5 retries to get a transcript now. It was working much better two weeks ago for the same videos.

It seems that YouTube raised its guards; it is also discussed here: #414

@danrosenberg ah interesting I think I stop at 5 retries using residential proxy. Seems like it's a matter of retrying and still using this versus a nodejs library.

timfong888 avatar Jun 09 '25 03:06 timfong888

Running into the same issue ! I am using the SDK btw. I made sure EN manual transcripts were available then fetched them but stumbled into this error. My code didn't change for months+ but his have been happening for days.

@maherbel infirst encountered the error using SDK. But for the same issue for the same video via CLI.

timfong888 avatar Jun 09 '25 03:06 timfong888

Is it possible that the format returned has changed? That's what I gathered from reading the comments on a nodeJS library encountering issues. The error reads like misformatted json. But I can't figure out what the change needs to be in the SDK itself without further telemetry like seeing the payload being parsed.

timfong888 avatar Jun 09 '25 03:06 timfong888

IMHO it is the PO token stuff mentioned in #414, particularly via Enrique's thread ending here: https://github.com/jdepoix/youtube-transcript-api/issues/414#issuecomment-2949257318

I have CI tests, and these went from failing once every couple of weeks. Starting last week-ish, it went to once a day, then to all the time over last week.

Unfortunately the PO token stuff isn't simple or easy -- until now, this approach worked very well for users being able to get transcripts on device.

There's a bunch of prior art for using PO tokens in various YouTube clients I've found on GitHub. But, I burned a lotttt of time on it, eventually got a PO token looking-thing. But it still didn't work. And it's not very fun trying to figure out why, it's not like you're getting helpful error messages :|

If anyone happens upon an known-good implementation for getting transcripts w/a PO token except yt-dlp, I'd be grateful for a ping -- there's too much indirection in the yt-dlp stuff for me to follow it fully, yet - because technically the PO token generator is feeding a plugin that feeds yt-dlp? If I understand correctly

jpohhhh avatar Jun 09 '25 23:06 jpohhhh

Is it possible that the format returned has changed? That's what I gathered from reading the comments on a nodeJS library encountering issues. The error reads like misformatted json. But I can't figure out what the change needs to be in the SDK itself without further telemetry like seeing the payload being parsed.

right. seems like they fixed it without any PO token stuff?

dcsilver avatar Jun 10 '25 02:06 dcsilver

Small update from me.

After 8 hours since last run, still the same error. I bought a proxy server, did a proxy api setup and now I get 429 Client Error.

I'm reading the source code now. (I hope I'll find something interesting.)

Image

I wrote a python script to get transcripts, for the first few videos there was no error, but after a few different videos I did a while loop with sleep and tried to get it.

Now, after debugging the script I can't get any response

Image

przybylku avatar Jun 10 '25 07:06 przybylku

If the issue still persists with the cookie file, then format of the webpage has possibly changed

satyajit-bagchi avatar Jun 10 '25 12:06 satyajit-bagchi

fetching timedtext universally requires POT now: https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide#introduction

related: https://github.com/yt-dlp/yt-dlp/issues/13075

lucyknada avatar Jun 10 '25 12:06 lucyknada

Hi all!

Just letting you know that I am aware of the issue and investigating possible solutions!

As others have noted this is definitely caused by YouTube increasingly enforcing the use of PO tokens. I am still investigating if there are ways to fetch timedtext URLs that won't require a PO token. I am also investigating how we could generate PO tokens, but this is a non-trivial process that seems to be subject to frequent change, therefore I would prefer if I could find way to avoid having to maintain a PO token builder (or relying on a dependency for it)!

If you have any ideas on possible solutions or have looked into reverse-engineering the PO token generation, feel free to jump into the discussion and share your knowledge! 🙂

But please, for the time being, retain from adding "Same problem here" comments as this clutters the discussion, without adding anything to finding a solution 😉 (I will delete such comments, to allow for a more focused discussion)

jdepoix avatar Jun 10 '25 13:06 jdepoix

As someone who just had this today:

YouTube is returning an empty 200 for the API call. I confirmed this by running curl 'https://www.youtube.com/api/timedtext?v=t_LvB6...DA&key=yt8&kind=asr&lang=en from multiple IPs

As similar things happen when your IP is blocked, I would suggest some error checking as a part of the package:

  1. Empty reply
  2. Anything that starts with <!DOCTYPE html>
  3. <2KB reply that cannot be automatically parsed (this is likely in the case of a text/HTML warning from YT)

Joshfindit avatar Jun 10 '25 13:06 Joshfindit

@Joshfindit that's only because you are missing a lot of vital keys in the timedText json response, most notably POT.

lucyknada avatar Jun 10 '25 14:06 lucyknada

@lucyknada good point. I was covering the request that youtube_transcript_api currently uses. My main goal was to say "this sort of thing will keep happening as time goes on. It would be helpful to wrap the code in checks and error messages so that users can clearly understand that these types of issues normally mean YouTube is cracking down again."

Joshfindit avatar Jun 10 '25 14:06 Joshfindit

According to the poToken guide from yt-dlp, which seems to have few things figured out already, the PO token is NOT required for some clients, like TVs or Android VR. Maybe that's a good starting point. It's definitely a PO-Token problem and the solution is not easy.

Troptrap avatar Jun 10 '25 15:06 Troptrap

Weirdly the sponsor SearchAPI of this project still has its service working like a charm. Is it just a question of proxy quality?

vincent38wargnier avatar Jun 10 '25 23:06 vincent38wargnier

I am running into the same issue. Has anyone figured out a solution or a workaround?

Via Inner Tube API works fine but is also tight when it comes to proxies

Seym0n avatar Jun 11 '25 00:06 Seym0n

As a follow up to my comment in #414, yt-dlp is at the current time able to retrieve captions without a PO token. It does that using a series of http calls that involve mimicking different clients (tv and ios in the example below). I incorporated yt_dlp in my app using their support for embedding it in a Python app. When using the "verbose": True and "debug_printtraffic": True params to YoutubeDL, you can see how it retrieves the captions using a series of http calls. Below is an example:

            [youtube] 79jdKfRUqw0: Downloading webpage (GET /watch?v=)
            [youtube] 79jdKfRUqw0: Downloading tv client config (GET /tv)
            [debug] Loading youtube-sts.612f74a3-main from cache
            [youtube] 79jdKfRUqw0: Downloading tv player API JSON (POST /youtubei/v1/player)
            [youtube] 79jdKfRUqw0: Downloading ios player API JSON (POST /youtubei/v1/player)
            [debug] Loading youtube-nsig.612f74a3-main from cache

It is able to do this because Google has not yet rolled out PO tokens to these clients, so in the future this may break too, but for now it is working using their latest nightly build.

enrtrav avatar Jun 11 '25 00:06 enrtrav

Temporary Workaround for youtube-transcript-api Using yt-dlp

I faced the same issue and can confirm that using yt-dlp is currently a reliable workaround to fetch captions, even auto-generated ones.

To keep things simple, I used yt-dlp from the command line to download the .vtt subtitle files and then parsed them using a custom Python script.

Here’s what worked for me:

yt-dlp --write-auto-sub --sub-lang en --skip-download --convert-subs vtt "https://www.youtube.com/watch?v=VIDEO_ID"

This downloads the subtitles in .vtt format. I then used Python to strip the tags and merge the lines into a clean transcript string or JSON.

While this isn't a drop-in replacement for youtube-transcript-api, it can be a practical temporary fix until the PO token issue is resolved or a more robust patch is integrated into the library.

Happy to share my script if it helps anyone.

EDIT: Thanks to @grigio for pointing out '--write-auto-sub'

naganandana-n avatar Jun 11 '25 07:06 naganandana-n

When I didn't have time to run scripts, I simply got subtitles to the clipboard directly from the browser via a bookmarklet. And it immediately cleared the subtitles from timestamps: javascript: (async function() { try { let getSubs = async (langCode = 'en') => { let response = await fetch(window.location.href); let text = await response.text(); let ytData = text.split('ytInitialPlayerResponse = ')[1]?.split(';var')[0]; if (!ytData) throw new Error('Субтитры не найдены!'); let ct = JSON.parse(ytData).captions?.playerCaptionsTracklistRenderer?.captionTracks; if (!ct) throw new Error('Субтитры отсутствуют для этого видео!'); let findCaptionUrl = x => ct.find(y => y.vssId.indexOf(x) === 0)?.baseUrl; let firstChoice = findCaptionUrl("." + langCode); let url = firstChoice ? firstChoice + "&fmt=json3" : (findCaptionUrl(".") || findCaptionUrl("a." + langCode) || ct[0].baseUrl) + "&fmt=json3&tlang=" + langCode; let subsResponse = await fetch(url); let subsData = await subsResponse.json(); return subsData.events.map(x => ({ ...x, text: x.segs?.map(x => x.utf8)?.join(" ")?.replace(/\n/g, ' ')?.replace(/♪|'|"|\.{2,}|\<[\s\S]*?\>|\{[\s\S]*?\}|\[[\s\S]*?\]/g, '')?.trim() || '' })) }; let copyToClipboard = async langCode => { const subs = await getSubs(langCode); const text = subs.map(x => x.text).join('\n').replace(/\n{2,}/g, '\n'); await navigator.clipboard.writeText(text) }; await copyToClipboard('en') } catch (error) { alert(Ошибка: ${error.message}) } })();

Yesterday this also stopped working and gives an empty json response. Does anyone have any ideas on how to make this work in the browser again, without yt-dlp?

alt-claymore avatar Jun 11 '25 08:06 alt-claymore

@alt-claymore what's the advantage of using youtube-transcript-api over yt-dlp?

kamish13 avatar Jun 11 '25 08:06 kamish13

@alt-claymore what's the advantage of using youtube-transcript-api over yt-dlp?

It takes more time. Sometimes when watching a video you just need to quickly get subtitles from the video to the clipboard with one click. And it worked very conveniently through the browser js

alt-claymore avatar Jun 11 '25 08:06 alt-claymore

@alt-claymore https://github.com/Kakulukian/youtube-transcript/pull/46 This hasn't been committed to the main branch yet, but it worked well for me.

noir-01 avatar Jun 11 '25 10:06 noir-01

@naganandana-n your command doesn't work here

yt-dlp --write-sub --sub-lang en --skip-download --convert-subs vtt https://www.youtube.com/watch\?v\=REbTO_HhdLg

[youtube] Extracting URL: https://www.youtube.com/watch?v=REbTO_HhdLg
[youtube] REbTO_HhdLg: Downloading webpage
[youtube] REbTO_HhdLg: Downloading tv client config
[youtube] REbTO_HhdLg: Downloading tv player API JSON
[youtube] REbTO_HhdLg: Downloading ios player API JSON
[youtube] REbTO_HhdLg: Downloading m3u8 information
[info] REbTO_HhdLg: Downloading 1 format(s): 401+251
[info] There are no subtitles for the requested languages
[SubtitlesConvertor] There aren't any subtitles to convert

but this works

yt-dlp --write-auto-subs --sub-lang en --skip-download --convert-subs vtt https://www.youtube.com/watch\?v\=REbTO_HhdLg

[youtube] Extracting URL: https://www.youtube.com/watch?v=REbTO_HhdLg
[youtube] REbTO_HhdLg: Downloading webpage
[youtube] REbTO_HhdLg: Downloading tv client config
[youtube] REbTO_HhdLg: Downloading tv player API JSON
[youtube] REbTO_HhdLg: Downloading ios player API JSON
[youtube] REbTO_HhdLg: Downloading m3u8 information
[info] REbTO_HhdLg: Downloading subtitles: en
[info] REbTO_HhdLg: Downloading 1 format(s): 401+251
[info] Writing video subtitles to: ⚠️SMISHING e VISHING: cosa sono e come proteggersi [REbTO_HhdLg].en.vtt
[download] Destination: ⚠️SMISHING e VISHING: cosa sono e come proteggersi [REbTO_HhdLg].en.vtt
[download] 100% of  210.25KiB in 00:00:00 at 532.07KiB/s
[SubtitlesConvertor] Converting subtitles
[SubtitlesConvertor] Subtitle file for vtt is already in the requested format

grigio avatar Jun 11 '25 14:06 grigio

This is currently working for me for version 1.0.3:

youtube_transcript_api foA4Sl_xlMc --language it --cookies ./cookies.txt  --format text

just to confirm, right today this stopped to work for me too, even with cookies. so sorry for the noise but the previous workaround is not working anymore

muflone avatar Jun 11 '25 16:06 muflone