Marianna
Marianna
Oh it means we can't scrape their data? On Thu, 19 Jan 2023, 21:44 Richard Nagyfi, ***@***.***> wrote: > via https://www.khanacademy.org/about/tos > > 8. Prohibited Conduct > YOU AGREE NOT...
Yt-dlp is better. It has more options and less limitations. On Thu, 19 Jan 2023, 17:46 Richard Nagyfi, ***@***.***> wrote: > Nice, I was thinking about https://pytube.io/en/latest/user/captions.html > > —...
Hi everyone! I already have a working pipeline for YouTube subtitles extraction and I already got subtitles for 1.5M+ videos. Regarding copyright, I don't think that subtitles have strict copyright...
Hi @totuta ! I'm just using [yt-dlp](https://github.com/yt-dlp/yt-dlp) and multi threading but if you want I will share the full script :)
Hey @ontocord,what do you mean by "convert dialog to instruction" ? Do you mean we can build an additional language model specifically for this task?
@Shtoner I also have downloaded JRE and Lex Ftiedman as well as other podcasts and shows. I can send you a lost of chanbels I scraped if you want but...
The problem with YT subtitles is that it's just plain text with no punctuation. Is there a way to convert it to dialog? @ontocord
I converted some good (human-generated) captions from Lex Friedman to dialogue. Here's what I got: ```B: "Battle not with monsters, lest ye become a monster, and if you gaze into...
Oh I think it's actually cool. Do you think we can use some open-source models instead of davinci (e.g. bloom)?
Yeah, I don't think openAI will be suitable for us at scale :)