NotelyVoice icon indicating copy to clipboard operation
NotelyVoice copied to clipboard

Choice of model size

Open BurrHM opened this issue 4 months ago • 17 comments

For some languages, the base model is not good enough. Can you allow using bigger models?

BurrHM avatar Aug 07 '25 11:08 BurrHM

For some languages, the base model is not good enough. Can you allow using bigger models?

Yes, this is possible. It will require some UX design changes.

tosinonikute avatar Aug 07 '25 12:08 tosinonikute

Thanks

BurrHM avatar Aug 07 '25 12:08 BurrHM

+1 For Model selection!

LoredCast avatar Aug 11 '25 11:08 LoredCast

Another +1 for model selection

RafiMuhluvch33s3 avatar Aug 24 '25 13:08 RafiMuhluvch33s3

+1 for this

The base model is very weak, especially when it comes to economic terms that I use in my lectures.

ballerbude avatar Aug 25 '25 10:08 ballerbude

+1 for this

The base model is very weak, especially when it comes to economic terms that I use in my lectures.

@ballerbude We must consider that this runs on a mobile device with limited processing power, rather than in the cloud. I have tried other large models, such as ggml-small.bin (500MB), and audio processing is prolonged on the device because it requires processing power that most devices cannot handle.

This is the reason why transcription apps use OpenAI's API directly, which processes your voice into models as large as 15GB and provides you with accurate, advanced transcriptions. However, this app prioritizes privacy and the ability to keep your voice recordings local, and separate from APIs, as they have the right to access and use your data once it is received.

For this to be fully implemented, it means the app will cease to be an offline app and will fully utilize OpenAI APIs.

tosinonikute avatar Aug 25 '25 16:08 tosinonikute

@tosinonikute thanks for the explanation, I get that.

Because of that limitations, I'd love to see the option to use a self-hosted whisper instance or something similar.

ballerbude avatar Aug 25 '25 22:08 ballerbude

@tosinonikute thanks for the explanation, I get that.

Because of that limitations, I'd love to see the option to use a self-hosted whisper instance or something similar.

When you say 'self-hosted,' do you mean uploading your own Whisper model to a server that you can reference in the app?

If yes, then the issue is not really about the server. The model is hosted on HuggingFace, so server resources are not the limiting factor. What matters are the device resources. For instance, the latest Google Pixel 10 Pro can handle a 500MB model efficiently during transcription, while an Android Samsung Galaxy S10 will struggle to process a model of that size or crash. Since users use all sorts of devices, a lot of users will experience failures when they try to use bigger models & it will seem the app isn't working.

tosinonikute avatar Aug 30 '25 07:08 tosinonikute

What about having two options: api or offline? If the offline model goes oom, than the user is asked if it wants to use the api. Or maybe it is just alerted with a message explaining it can choose a smaller model or use the api (which I guess will require an API key).

By the way, my understanding is that openai will NOT use the data for training their models, neither when using the free whisper tier. Source: https://platform.openai.com/docs/guides/your-data

00sapo avatar Sep 20 '25 07:09 00sapo

@00sapo Thanks for your suggestion & the link shared.

That sounds like a good idea, the only con of this is the complexity for non-tech users, most regular users don’t know what an “API key” is, or how to get one. I will end up with users asking “Where do I get the key?” or “Why is my key not working?” etc.

I might go with having a separate settings screen with advanced section that users can now choose to select online mode which they are allowed to add an API key. Now this begs another question, are users willing to pay the standard OpenAI Whisper transcription: $0.006 / minute (i.e. 0.6 cents per minute of audio) or $0.012 for higher quality. Because OpenAI api won't be free.

tosinonikute avatar Sep 20 '25 10:09 tosinonikute

From my understanding, the whisper free tier should be ok for many users (3 requests per minute, 200 requests per day). For long audio that may require more than 3 rpd, there should be a logic that checks the error response of the remote server to decide if waiting or not...

00sapo avatar Sep 20 '25 12:09 00sapo

I think the ability to allow users to choose better models is a must for longevity. I tried to transcribe Arabic and it was not good. This keyboard is great when it comes to Arabic.its also open source

medo2132112 avatar Oct 02 '25 08:10 medo2132112

I think the ability to allow users to choose better models is a must for longevity. I tried to transcribe Arabic and it was not good. This keyboard is great when it comes to Arabic.its also open source

Thanks I’ll be working on this really soon

tosinonikute avatar Oct 05 '25 02:10 tosinonikute

@tosinonikute That's great to hear

medo2132112 avatar Oct 05 '25 20:10 medo2132112

What models sizes are consider? What big they are?

If user can chose better model will be great

bi4key avatar Oct 12 '25 22:10 bi4key

What models sizes are consider? What big they are?

If user can chose better model will be great

Its around 468 MB in size, I just made a release here: https://github.com/tosinonikute/NotelyVoice/releases/tag/v1.2.6

Hopefully this is will be much better than the base model, I have plans of implementing @00sapo's suggestion of the whisper free tier should be ok for many users (3 requests per minute, 200 requests per day), but that would take a little while. Still in the works.

tosinonikute avatar Oct 23 '25 18:10 tosinonikute

Can you also add the option for the app to use the gpu rather than the cpu? It would make transcriptions faster.

guranu avatar Nov 03 '25 14:11 guranu