dsnote icon indicating copy to clipboard operation
dsnote copied to clipboard

Feature Request Mimic 3 voices

Open gitchat1 opened this issue 1 year ago • 11 comments

First of all, thanks for all your hard work!! It is really appreciated. I was wondering if you could integrate the Mimic 3 voices into Speechnotes. They make fewer mistakes when actually reading text and have extremely low hardware requirenments.

gitchat1 avatar Jan 31 '24 21:01 gitchat1

I'm glad you find Speech Note useful!

Mimic 3 voices are already supported and enabled. You should be able to find them in the model browser in the app.

What version and what package type (Flatpak, AUR, Deb) you are using?

Mimic 3 requires a Python runtime and currently all Python components are only enabled on x86_64, so you won't actually be able to use them on ARM.

mkiol avatar Feb 01 '24 08:02 mkiol

Thanks I updated my flatpak and it worked like a charm. Allthough now when I click on the Other Tab the programme tells me that there is an addon for GPU acceleration available that is currently not installed. Do you have any idea how I could fix this?

gitchat1 avatar Feb 01 '24 12:02 gitchat1

there is an addon for GPU acceleration available that is currently not installed. Do you have any idea how I could fix this?

If you have NVIDIA or AMD graphics card and want to speed-up processing, you need to install additional Flatpak package. You can use application manager app (like 'Discover' in KDE Plasma) or install via command line in terminal.

Command line:

for NVIDIA card:

flatpak install net.mkiol.SpeechNote.Addon.nvidia

for AMD card:

flatpak install net.mkiol.SpeechNote.Addon.amd

-- Update -- Please have in mind, that GPU Add-on package is huge in size. Just for Mimic3 voices, GPU acceleration is not needed, so you don't need to install it. It is used only for Whisper STT and Coqui TTS.

mkiol avatar Feb 01 '24 12:02 mkiol

Thanks for that. As far as I can tell this does not really work for me any time I try to use a STT that supports ROCM the application closes immediately. It does not matter whether I want to transcribe a file or just hit the listen button. I'm using the Linux Kernel 5.15.

gitchat1 avatar Feb 01 '24 15:02 gitchat1

What graphics card do you have?

If app crashes, try enabling "Override GPU version" in the settings. Most likely your GPU is not supported in AMD ROCm but this "override to supported version" sometimes resolves the problem.

mkiol avatar Feb 01 '24 17:02 mkiol

I have a Radeon 5700xt The programme also recognizes the Card it just crashes.

gitchat1 avatar Feb 01 '24 18:02 gitchat1

And "Override GPU version" option doesn't fix it, right?

mkiol avatar Feb 01 '24 18:02 mkiol

No, allthough it does appear to be working for a few seconds before the app crashes whereas when the override is not active the app crashes instentainiously.

gitchat1 avatar Feb 01 '24 19:02 gitchat1

Thanks for checking.

It looks that with your GPU you can't use AMD ROCm. Btw, according to AMD docs, your card is not officially supported.

Another option is GPU acceleration with OpenCL. To enable it, please disable "Use AMD ROCm" and "Override GPU version" (like below) and restart the app. After restart make sure that "OpenCL" device is selected in "Graphics card" in "Speech to Text" settings. Most likely "Auto" should automatically select the right card.

image

OpenCL provides quicker STT on all "Whisper" models (not on "Faster Whisper"). It it not as fast as ROCm but still much faster that CPU. TTS does not support OpenCL right now.

mkiol avatar Feb 01 '24 20:02 mkiol

Thanks for the tip. That does not seem to be working either though my graphics card stays at 10% isage while my cpu goes up to 50% if it was really using OpenCL I think that would have to look differently wouldn't it?

gitchat1 avatar Feb 10 '24 18:02 gitchat1

Can this issue now be closed? We now have "Mimic 3" voices.

JamesClarke7283 avatar Mar 19 '24 01:03 JamesClarke7283

Closing this issue since "Mimic 3" is supported.

@gitchat1

Regarding problems with GPU, new version 4.6.0 comes with much quicker STT on WhisperCpp models without GPU acceleration. I know it is not a solution but something that might be worth trying.

The new version should be available in Flathub tomorrow.

mkiol avatar Aug 03 '24 13:08 mkiol