youtube-transcript-api
youtube-transcript-api copied to clipboard
TranscriptsDisabled But it's not disabled (works locally, fails on Cloud machine)
To Reproduce
using youtube-transcript-api-0.6.2:
cat test.py
from youtube_transcript_api import YouTubeTranscriptApi
print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
outputs:
python3 ./test.py
Traceback (most recent call last):
File "/root/border0-plugin/./test.py", line 3, in <module>
print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
return TranscriptListFetcher(http_client).fetch(video_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
self._extract_captions_json(self._fetch_video_html(video_id), video_id),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
raise TranscriptsDisabled(video_id)
youtube_transcript_api._errors.TranscriptsDisabled:
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=w8rYQ40C9xo! This is most likely caused by:
Subtitles are disabled for this video
If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
What code / cli command are you executing?
I am running
from youtube_transcript_api import YouTubeTranscriptApi
print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
Which Python version are you using?
Python 3.11.6
Which version of youtube-transcript-api are you using?
youtube-transcript-api-0.6.2
Expected behavior
Describe what you expected to happen.
I expected to receive the english transcript
I can see it in browser, see screenshot:
Actual behaviour
Traceback (most recent call last):
File "/root/border0-plugin/./test.py", line 3, in <module>
print(YouTubeTranscriptApi.get_transcript('w8rYQ40C9xo'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 137, in get_transcript
return cls.list_transcripts(video_id, proxies, cookies).find_transcript(languages).fetch(preserve_formatting=preserve_formatting)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
return TranscriptListFetcher(http_client).fetch(video_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
self._extract_captions_json(self._fetch_video_html(video_id), video_id),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/border0-plugin/myenv2/lib/python3.11/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
raise TranscriptsDisabled(video_id)
youtube_transcript_api._errors.TranscriptsDisabled:
Could not retrieve a transcript for the video https://www.youtube.com/watch?v=w8rYQ40C9xo! This is most likely caused by:
Subtitles are disabled for this video
If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
Yes the issue is valid, but it seems that this does not show with the link, which Youtube gave us when we use the link from Share button.
Hi @atoonk, do you only have this issue with this specific video, or all videos you are trying to retrieve? I can retrieve the subtitles for that video without any issues, which usually means that you are being rate-limited by YouTube (which would also mean that this should happen for all videos).
Hi @jdepoix, I encountered the same problem yesterday with every video I tried. Although I don't use the API frequently, I do access it a few times per day. I hope it's not some new restriction from YouTube. I experienced the same problem as @atoonk, and the issue is still present today.
Thanks a lot for your quick response and for this amazing tool; I really like it.
Hi @SKVNDR, then you're most definitely being blocked by YouTube. The only way to work around this is to change your IP address in any way (VPN, proxy, or assign a new IP if possible).
I can confirm that YouTube is most likely blocking =/ It works from my local dev env but it doesn't work in production all things equal.
If you're running your code on a cloud machine it could be that (depending on your setup) you're getting assigned an IP from a pool that is shared with other machines. So the IP you're using could potentially be blocked without you doing anything. YouTube could also generally black list certain IPs that are known to belong to cloud providers (just a guess, I don't know if they actually do that!).
Ah yes, i tried it from my laptop at home and it works fine now. And indeed, it affected all videos, which I why I thought it was a bug or new behaviour in YT api. So, I guess YouTube blocked me (this was on Digital ocean machine). Bummer, gotta find a way around that. Any docs on the ratelimit numbers or when folks get added? I only run this once every few weeks and only for a dozen videos or so. So bit surprised I was blocked. Unless it's all of digital ocean.
Since this is not an official API, there unfortunately is no information on rate limits and when or for how long you will get blocked. People have been reporting different things, so I don't feel like it is consistent either.
I will pin this issue and leave it open, since there are issues being opened due to this all the time. Feel free to discuss workarounds and share your experience on YouTubes blocking heuristics, but be aware that there is no proper fix here and probably never will be. That's the nature of using an unofficial API unfortunately.
Same for me. I use a droplet on DigitalOcean, and YouTube probably blocked the IP from there, but using a proxy fixed the issue...
Same for me. I use a droplet on DigitalOcean, and YouTube probably blocked the IP from there, but using a proxy fixed the issue...
how did you create a proxy can you share the code. did you use a free proxy or paid? how did you obtain that proxy?
Hi @auspy,
from youtube_transcript_api import YouTubeTranscriptApi
YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "https://user:pass@domain:port"})
I'm using a paid proxy from smartproxy.com with the "Residential" offer. There are probably other better proxies available; I chose this one randomly.
confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)
transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})
can make a more details quick video if folks are interested in how to use that.
Having the exact same issue, & also using DigitalOcean droplet
confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)
transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})can make a more details quick video if folks are interested in how to use that.
sure would love a video on it. drop the link here
Hi @auspy,
from youtube_transcript_api import YouTubeTranscriptApi YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "https://user:pass@domain:port"})I'm using a paid proxy from smartproxy.com with the "Residential" offer. There are probably other better proxies available; I chose this one randomly.
thank you for sharing. this surely looks like a cheap option but I was looking for something free. don't want to pay in initial stages of my project.
confirmed, using a proxy from my droplet worked. I used this to proxy traffic from my digital ocean droplet to my local laptop. https://docs.border0.com/docs/expose-a-http-proxy which will allow you to expose a proxy on localhost and have it egress on a separate machine (in my case my laptop)
transcript = YouTubeTranscriptApi.get_transcript(video_id, proxies={"https": "http://localhost:8080"})can make a more details quick video if folks are interested in how to use that.
sure would love a video on it. drop the link here
@auspy Would Appreciate a vid. or just more info. ::: I'm all new to proxies etc. seems most info. online is kinda for the more experienced :::
Just ran across this issue today, glad I found this thread. I too am on Digital Ocean, running my code in a Docker container. Getting transcripts runs fine locally, but not on DO.
I would appreciate the video mentioned above, as proxies are new to me. If I use my localhost as a proxy, it means I need to leave the machine running 24/7 right? I mean, I guess that's obvious.
Yep, same with me -- looks like youtube blocked many DO servers at once -- i didn't spent so much requests and I'm banned.
I also use Digital Ocean droplet, i think they block IPs from DO servers. now I using google cloud functions.
I can confirm that it is a problem with digital ocean servers being blocked. Using a proxy is the solutiion.
Now this error also in google cloud functions.
Blocked from dedicated OVH too
Has anyone faced same issue on pythonanywhere?
faced the same issue today in aws ec2
same issue today on aws lambda
Hetzner VPS are blocked too.
Same issue, but on Deepnote environment. Does anyone know how to change Deepnote's IP address or something about changing proxies in Deepnote? When I run the same code on my local environment, the transcription works fine. However, it seems like YouTube is blocking Deepnote's IP.
@danielsanmartin I see that you have forked a repository to avoid IP Ban. Can you write instructions how to download CA_BUNDLE ?
I also have the same issue from yesterday. Does anyone know if youtube_transcript_api use any intermediate servers if proxy are not set explicitly?
Because in my case issue is very strange. youtube-transcript-api works well locally without proxies and with proxies, but when I setup it on Pythonanywhere, it stopped working without proxies and even with proxies.
At the same time when I make a direct request to YouTube public API endpoint https://www.youtube.com/api/timedtext with parameters that I extract via Chrome Web Console -> Network tab and use requests library, in such case it works both locally and at Pythonanywhere, with and without proxies.
What can be the issue? Might be it relates to headers that youtube_transcript_api generates?