whispering-ui icon indicating copy to clipboard operation
whispering-ui copied to clipboard

Some optimization suggestions

Open OptimisticGeek opened this issue 2 months ago • 7 comments

Hello, Sharrnah

In use, I have several optimization suggestions:

  1. Add a button to clear the history on the right
  2. Add a checkbox to control whether the history is displayed in the list
  3. Optimize the display of the history list, increase the adaptive height. Now, the translation and original text will be mixed together render
  4. Add shortcut keys to control whether to send data to the VRC through OSC.
  5. Whether it can support the function of dynamic switching microphone to recognize in-game speech or speak by yourself. Or support both in-game voice and own voice, which is more memory saving than I open two Tiger.Now due to graphics card limitations, I can only very difficult to choose one of them.

Your Chinese friend, OptimisticGeek

image

OptimisticGeek avatar Apr 17 '24 14:04 OptimisticGeek

Thanks for the suggestions.

  1. Add a button to clear the history on the right

Should be no problem. Should this also affect the CSV export? since both are saved differently.

  1. Add a checkbox to control whether the history is displayed in the list

What would this good for? If you don't need the history, you don't need to use it. Just curious. If you give me a good reason i am more than happy to add it.

  1. Optimize the display of the history list, increase the adaptive height. Now, the translation and original text will be mixed together render

This is a known issue with the currently used UI library. The new version of it should allow this, but it has heavy memory issues when using a font that supports all the different language characters, so i kept it on the currently old version. They are improving it already, so this is hopefully soon a thing of the past.

  1. Add shortcut keys to control whether to send data to the VRC through OSC.

Do you have some suggestion for a shortcut key? Or should it be configurable similar to the push to talk configuration?

  1. Whether it can support the function of dynamic switching microphone to recognize in-game speech or speak by yourself. Or support both in-game voice and own voice, which is more memory saving than I open two Tiger.Now due to graphics card limitations, I can only very difficult to choose one of them.

This is on my roadmap, But i can't say any date when this is ready yet as its not so simple in how it currently all works together.

Sharrnah avatar Apr 17 '24 18:04 Sharrnah

Should be no problem. Should this also affect the CSV export? since both are saved differently. What would this good for? If you don't need the history, you don't need to use it. Just curious. If you give me a good reason i am more than happy to add it.

I didn't need to use the history.I wanted to minimize my hardware consumption as much as possible, but I didn't know how to turn it off. Is there a risk of memory leaks if the list size is too long? I suggest that you record all the records in the local database and export the records in the database when saving CSV.

Do you have some suggestion for a shortcut key? Or should it be configurable similar to the push to talk configuration?

Yes, you need to support configurable shortcuts to avoid conflicts with other applications. It would be better to support shortcuts such as F1~F12, PrtSc, Home, and Pause.

This is on my roadmap, But i can't say any date when this is ready yet as its not so simple in how it currently all works together.

I also think this is a complicated project. Please release this exciting news when you have new progress.

OptimisticGeek avatar Apr 17 '24 19:04 OptimisticGeek

I didn't need to use the history.I wanted to minimize my hardware consumption as much as possible, but I didn't know how to turn it off. Is there a risk of memory leaks if the list size is too long? I suggest that you record all the records in the local database and export the records in the database when saving CSV.

There should be no issue and i am not aware of any memory leak because of the history. And since it is only text, the RAM usage should be no issue and it only uses the CPU RAM and no Video-RAM.

The only possible memory leak i might have found is when using faster whisper, realtime mode and Run each transcription in a seperate thread. That is most likely an issue with Faster whisper though or maybe something specific on my PC since i never heard of any complain yet. But its why i added the "Run each transcript in a seperate thread" option.

Sharrnah avatar Apr 17 '24 19:04 Sharrnah

The only possible memory leak i might have found is when using , and . That is most likely an issue with Faster whisper though or maybe something specific on my PC since i never heard of any complain yet. But its why i added the "Run each transcript in a seperate thread" option.faster whisper``realtime mode``Run each transcription in a seperate thread

I haven't had any crashes so far

OptimisticGeek avatar Apr 17 '24 20:04 OptimisticGeek

5.Whether it can support the function of dynamic switching microphone to recognize in-game speech or speak by yourself. Or support both in-game voice and own voice, which is more memory saving than I open two Tiger.Now due to graphics card limitations, I can only very difficult to choose one of them.

If you switch the microphone and speaker through the shortcut key, Is it easier to implement? This is not the way of parallel recognition, but through a single recognition mode, only need to change the input source.

example:

  1. Switch to the speaker and use the speaker as an input source to listen for in-game speech
  2. Switch to the microphone and use the microphone as an input source to listen to your own voice

OptimisticGeek avatar Apr 18 '24 17:04 OptimisticGeek

It is not so much about switching the device, but about everything that is attached to it. For example, you probably do not want the text-to-speech trigger when you recorded someone else speaking. Or possibly you do not want to send the OSC of the other person,

or you do not want realtime mode for your own speech, but you want realtime mode for the other person etc.

I appreciate the suggestion though.

Sharrnah avatar Apr 18 '24 20:04 Sharrnah

It is not so much about switching the device, but about everything that is attached to it. For example, you probably do not want the text-to-speech trigger when you recorded someone else speaking. Or possibly you do not want to send the OSC of the other person,

or you do not want realtime mode for your own speech, but you want realtime mode for the other person etc.

I appreciate the suggestion though.

Yes, I didn't think it through enough, I hope my ideas can bring you some inspiration. I really like Tiger!

OptimisticGeek avatar Apr 18 '24 23:04 OptimisticGeek