Enhance perfs using asynchronous HTTP and aiohttp lib
Is your feature request related to a problem? Please describe.
Enhance performances by using asynchronous code. Specifically when consuming HTTP streams. The aiohttp is mature and can be easily implemented.
Describe the solution you'd like
Ditch requests for aiohttp.
Describe alternatives you've considered
Using async with requests, but not supported by the project.
Hi @clemlesne!
Thank you for the suggestion!
I am definitely up for adding an async interface to the module! Just a few notes on what you wrote:
Enhance performances by using asynchronous code
Just to make this clear: this will not enhance performance of fetching a single transcript. It will only make it easier to fetch a batch of transcript concurrently, in a non-blocking fashion.
Ditch requests for aiohttp
Ditching requests altogether and making the full interface async would completely break the API, which I would like to avoid. Also, using a sync API can be easier for people to get started, if they are not already in an async runtime. However, I would be open to adding a YouTubeTranscriptAsyncApi class which exposes the same interface, but with the methods being async.
All that being said, I currently don't really have the time available to implement this, but I would be very willing to take contributions on this! 🙂
Just to make this clear: this will not enhance performance of fetching a single transcript. It will only make it easier to fetch a batch of transcript concurrently, in a non-blocking fashion.
We’re 100% aligned
Ditching
requestsaltogether and making the full interface async would completely break the API
What would you think to wrap the new async code within asyncio.run? Plus maybe log a warning in the console advising to use async code, to motivate using the best practice. It’ll avoid to duplicate the logic while enhancing the concurrency for advanced users.
Hi @clemlesne!
Sorry for coming back to you so late on this, I had lost track of this issue a bit!
What would you think to wrap the new async code within asyncio.run?
Yes, that is a good point, we could do that! I should probably rephrase what I said: I am not opposed to ditching requests, but rather opposed to ditching the sync API. In case the sync calls just wrap the async calls with asyncio.run, I wouldn't mind ditching requests!
Plus maybe log a warning in the console advising to use async code, to motivate using the best practice
I don't think we should do this to be honest. This is just a small library which often is integrated into an already existing stack. I don't think this library should try to dictate what that stack should look like! Whether the user can migrate to using async often is much beyond the scope of integrating this library and I'd rather keep integration as simple as possible.
Hi @jdepoix,
I'd like to work on this issue to add async support for better performance. I've implemented the core logic using httpx, including classes like YoutubeTranscriptAsyncApi, TranscriptListFetcherAsync, etc. Tests and any final polish are still pending.
Quick overview of my approach:
- Added an async version of the API with methods like async def fetch() and async def list().
- Handled proxies, retries, and error raising asynchronously.
- Introduced async gathering for bulk fetches (e.g., fetch_all for multiple video IDs).
Does this align with what you had in mind? Any specific requirements or preferences (e.g., sticking strictly to aiohttp)? I'd appreciate feedback before I finalize tests to ensure I'm on the right track.
Happy to share a draft PR or code snippets if helpful.
@kaya70875 nice work, would be nice if you create the draft PR and share it so others can take a look. It may be good enough already for people who are tinkering. And of course you can still update it later according to what the maintainer says 🙂 .
@CodeWithOz Of course! Here I created a draft pr for review.