mkiol
mkiol
Thank you for the idea. I understand the need and agree that this would be quite useful. I will investigate what options are available.
> Probably due to the assumptions of flatpak environment, and locations of /qml /plugins dirs Theming is a nightmare to me, but it should work fine. I've tested it on...
Definitely, looks very interesting. Processing pipeline seems to be as follows: 1. Audio transcription => "words" + timestamps 2. Audio segmentation => "speaker-id" + timestamps 3. Matching "words" to "speaker-id"...
I did some research to find out what is possible. It looks as follows: - Almost everyone uses [pyannote](https://huggingface.co/pyannote) segmentation models for diarization. The models work well... but not perfectly....
Yes, WhisperX uses the same pyannote models. Therefore you have to [pass HF token](https://github.com/m-bain/whisperX#speaker-diarization) to use diarization :(
@devSJR unfortunately same pyannote models are needed to make it work :( https://github.com/juanmc2005/diart?tab=readme-ov-file#get-access-to--pyannote-models
It is quite an interesting project I must say. Transcribing video/audio stream from the Internet... It is definitely something doable. I have to say I'm not entirely convinced yet. Maybe...
Obviously it is a very good idea 👍🏿 Actually I didn't know that GitHub has an option for this but indeed there is such option. I can change the name...
Thanks for the report. Would you be able to collect logs for this problem? You can do it by starting Jupii with `--verbose` option. ``` flatpak run net.mkiol.Jupii --verbose ```
Thanks a lot for the investigation. I'm planing to push new release of Jupii and I will try to resolve this problem in new version.