ai-meeting-transcription
ai-meeting-transcription copied to clipboard
AI Tool for meeting transcriptions
AI Meeting Transcription
Repo showcasing AI meeting transcription tool.
Summary
This repo showcase a basic tool for meeting transcription. It's targetted at meetings conducted in English, but with little tweaking could be used for other languages as well.
Workflow
The tool works in a three step process:
- It extract audio path from given video file or YouTube link
- It generates speaker diarization (separating different speaker tracks) by using
pyannote/speaker-diarization-3.0model - Finally it generates transcription using Open AI Whisper model. By default it uses Whisper
base.enversion but you can select other model sizes. The output is saved tooutput.subfile in SubViewer format.
Local processing
All processing is done locally on the users machine. The model weights are downloaded to local ~/.cache folder (on macOS).
- Speaker Diarization 3.0 model weights around 6 MB
- Whisper Base.en model weights around 300 MB
Setup
Install Dependencies
Install following dependencies (on macOS):
ffmpegCLI -brew install ffmpeg- Python 3 installation - e.g. Miniconda or Homebrew package.
- Python packages -
pip3 install -r requirements.txt
Hugging Face token
In order to download models used by these tool you need to:
- Generate a private Hugging Face auth token - instructions here
- Create
.envfile inside root repo folder with following content:
HUGGINGFACE_AUTH_TOKEN="your token here..."
- Accept
Speaker diarization 3.0model terms of service - link here - Accept
"Powerset" speaker segmentationmodel terms of service - link here
Running
Web UI
In order to run Web UI just run python3 ./web-ui.py in the repo folder. This should open following Web UI interface in the browser.
Jupyter Notebook
The tool can be used as Jupyter Labs/Notebook as well, you open the Transcription.ipynb in Jupyter Labs.
Notes
Speaker diarization steps is the longest part of moder execution. It roughly takes 30s for each 1 minute of the meeting to execute on M1 MacBook Pro.
Troubleshooting
- If you get following error
"Could not download 'pyannote/segmentation-3.0' model. It might be because the model is private or gated so make sure to authenticate."then make sure you provided Hugging Face auth token AND acceptedSpeaker diarization 3.0model terms of service.