diart
diart copied to clipboard
A python package to build AI-powered real-time audio applications
Hi, thanks for your repository! I trained a model with a specific voice using the pyannote package, and now I want to use your streaming approach in my task. How...
I have download these two models from HF: 1. https://huggingface.co/pyannote/segmentation-3.0/blob/main/pytorch_model.bin (segmentation model) 2. https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/blob/main/pytorch_model.bin (embedding model) How do I load these models from files? Trying `segmentation = models.SegmentationModel.from_pretrained("PyAnnoteDiarization/pyannote_model_segmentation-3.0.bin", use_hf_token=False) `...
First of all, thanks for the project! I’m using it in a live Speech-to-Text (STT) + diarization setup: https://github.com/QuentinFuxa/whisper_streaming_web I am testing my pipeline using MacOS BlackHole, which routes the...
I have a custom streaming pipeline with a VAD setup that triggers ASR processing only when speech is detected on a small chunk. The pipeline operates in a streaming fashion,...
Is there any particular reason rx library is used? Why do we need asychronous code in this repo? What if we did not use rx at all, and the code...
Running diart.stream on macOS Sonoma (Apple Silicon) crashes with a RuntimeError: torchaudio_sox::get_info(), apparently because a pathlib.PosixPath object is passed to torchaudio.info, which now expects a str. The same call chain...
Hello How are you? Thanks for contributing to this project. I am trying to add ReDimNet model (https://github.com/IDRnD/ReDimNet) as embedding model of piple-line. But the ReDimNet model does not require...
### Overview Implements a WebSocket server that can handle audio streams from multiple client connections ### Changes - Added multi-client support to WebSocket server - Created `StreamingInferenceHandler` for managing connections...
Based on this code here: https://github.com/juanmc2005/diart/blob/392d53a1b0cd67701ecc20b683bb10614df2f7fc/src/diart/blocks/diarization.py#L50 it seems that attributes like duration and etc. are initialized with an "_" before their name. This raised an issue here: https://github.com/juanmc2005/diart/blob/392d53a1b0cd67701ecc20b683bb10614df2f7fc/src/diart/optim.py#L111 SpeakerDiarizationConfig class...
Hi Developers, Thank you for your amazing work on this project! I was wondering if there’s a way to use Silero VAD. I noticed that PyAnnote VAD is supported, but...