Transcript talks
I discovered yt2doc who transcribes videos & audios online into readable Markdown documents.
I tried it on this movie already transcripted: https://www.rubyvideo.dev/talks/leveling-up-developer-tooling-for-the-modern-rails-hotwire-era-ruby-turkiye-meetup
- I found the YouTube URL: https://www.youtube.com/watch?v=g2bVdaO8s7s
- I tried to transcript it with a Docker setup and with an Ollama endpoint:
docker run --network="host" --mount type=bind,source=/home/debian/yt2doc,target=/app ghcr.io/shun-liang/yt2doc --video https://www.youtube.com/watch?v=g2bVdaO8s7s --timestamp-paragraphs --add-table-of-contents --llm-server http://host.docker.internal:11434/v1 --llm-model qwen2.5:14b -o .
- The table of content (
--add-table-of-contents) didn't work but the result is pretty nice IMO: https://gist.github.com/cbldev/143ad5b9fd4d750436d1b244a85d3490
thanks @cbldev that sounds promising. Currently the quality of the transcript is not great, the raw version we get from YouTube is crap. We try to improve it with OpenAI while the results are better they are sometime incomplete and the timings are often wrong. The video you used wasn't improved it is the raw transcript from Youtube.
The output from yt2doc looks much better. It is not perfect some terms are incorrect but overall is seems fare superior to what we have.
Is this something easy to install locally ? can we run it locally and then seed the results in prod? Is it possible to give some context to the transcriber engine so that we can help him with the technical terms that could be used in the talk?
Is this something easy to install locally ?
Yes!
I already had Ollama on my host with qwen2.5:14b pulled.
I followed the Run in Docker section of their README:
docker pull ghcr.io/shun-liang/yt2doc
docker run --network="host" --mount type=bind,source=<directory-on-host>,target=/app ghcr.io/shun-liang/yt2doc --video <video-url> --timestamp-paragraphs --llm-server http://host.docker.internal:11434/v1 --llm-model <llm-model> -o .
Few minutes after I've had the transcript in a Markdown file, and that's all!
can we run it locally and then seed the results in prod?
Yes. With a customized output file (-o some_dir/transcription.md), it can be easy to script it and push the transcript to an API endpoint per e.g.
Is it possible to give some context to the transcriber engine so that we can help him with the technical terms that could be used in the talk?
Good question, maybe first by changing the Whisper configuration. Another idea can be to fork the project, customize the system prompt and maybe try it with a code specialized LLM like qwen2.5-coder.