sherpa-onnx Add cache mechanism to sherpa tts

Delay is a major drawback of sherpa tts on android phones, which makes it unusable for average blind people using average phones. By caching the most frequent tts requests, sherpa is made as fast as lightning on most of the time, using much less cpu and battery. Some blind individials have tested this upgrade and have been happy :)

Considered guide-lines:

App does not cache user texts due to security and privacy concerns. Just a hash is saved from the texts.
Usage statistics of texts(hashes) are saved to a list, and stored periodically(every 10 minutes). If cache space is filled, wav files that are less used are removed.
Setting amount of cache and the ability to remove cached wav files are added in the MainActivity
Present jni functions are tried to remain intact due to possible compatibility considerations.
When changing speed, cache could be cleared automatically to update the speed of cached media as well, not implemented yet as there is a clear button on MainActivity.kt.
Code is scanned for possible bugs with deepseek AI.

Jan 17 '25 16:01 mah92

Screenshot_20250117-200932_TTS Engine

Jan 17 '25 16:01 mah92

Thank you for your contribution!

Will review it.

Jan 20 '25 01:01 csukuangfj

Please use https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/check_style_cpplint.sh to check the style of your C++ code.

You can refer to https://github.com/k2-fsa/sherpa-onnx/blob/66e02d83815c4f4010e81c479bf690e44b865cae/scripts/check_style_cpplint.sh#L19-L26

You can use

pip install clang-format==12.0.1

to install clang-format and use it to auto-format your code.

Jan 22 '25 04:01 csukuangfj

Thank you, I will correct tomorrow. I wonder if it is better to remove the clear button and remove the cache automatically when the speed is changed. By the way the clear button seems to be displaced after merging to the newer commit.

Jan 22 '25 04:01 mah92

I wonder if it is better to remove the clear button and remove the cache automatically when the speed is changed

Yes, I agree.

By the way, you can even set a default cache size. Users don't need to set the cache size through the UI.

Jan 22 '25 05:01 csukuangfj

Another proposal: whatever we do in the model, there still remains some words that are not spelled correctly(like keyboard letters). Good to add them permanently in the cache so that they be read fron the cache instead of the model?

Jan 22 '25 05:01 mah92

About the cache slider, I think it is good to keep it for a while. Then rethink based on user feedback. I know it is good between 20 and 200 MB but need feedback to fix it(maybe not a single optimal value based on phone). I also think the user may be happier when knowing about the cach size and would accept the extra memory usage easier. Also better for privacy concerns to see the slider.

Jan 22 '25 05:01 mah92

the cache instead of the model?

Ok, it is fine with me.

Jan 22 '25 05:01 csukuangfj

Another proposal: whatever we do in the model, there still remains some words that are not spelled correctly(like keyboard letters). Good to add them permanently in the cache so that they be read fron the cache instead of the model?

Please make the cache do only 1 thing.

You can create a separate PR to add it.

Jan 22 '25 06:01 csukuangfj

Please use https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/check_style_cpplint.sh to check the style of your C++ code. ...

done.

Jan 23 '25 13:01 mah92

Shall I add OfflineTtsCacheMechanismConfig to OfflineTtsConfig or keep it separate?

Jan 24 '25 06:01 mah92

Shall I add OfflineTtsCacheMechanismConfig to OfflineTtsConfig or keep it separate?

Please keep it separate.

Jan 24 '25 07:01 csukuangfj

impl_ is private in csrc/offline-tts.h. Unable to use it in jni/offline-tts.cc private: std::unique_ptr<OfflineTtsImpl> impl_;

Instead I can put OfflineTtsCacheMechanism* cache_; inside OfflineTts itself.

Jan 24 '25 08:01 mah92

You don't need to wrap.impl_ to.jni.

You need to.wrap. CacheMechanism to jni.

Not sure why you.need to wrap impl_ to jni.

Jan 24 '25 08:01 csukuangfj

You told me ealier to update OfflineTtsImpl and add CacheMechansim pointer there. It was impossible to use it from there due to it's private nature. So should I cache the pointer in the jni/offline-tts.cc?

Note that if I cache it in jni/offline-tts.cc, I should move calling all it's methods like GetWavFile there. It means the cache mechansim would only be added to kotin/android.

If I cache the pointer as public in class in csrc/offline-tts.h, it would remain compatible(other projects just won't have cache) but might be easier for them to use cache later(using the second constructor and passing OfflineTtsCaxheMechanismConfig). Still it might not be neat.

Jan 24 '25 09:01 mah92

Yes, I said that.

Let me give.you the pseudo code later.

Jan 24 '25 10:01 csukuangfj

This is what I have done in the last commit, looking forward to your pseudo code... offline-tts-cahce-mechanism-config in both SherpaOnnxTts and SherpaOnnxEngine Removing cache_dir from OfflineTts Removing function implementations from OfflineTts Passing config pointer to OfflineTts Added cache_mechanism_ to OfflineTts (as public, might not be so neat) Changing all cache units in kotlin to Bytes from MB to reduce confusion. Showing is still in MBs

P.N: Due to problems in my side, I was not able to test SherpaOnnxTts directly. I just test on SherpaOnnxEngine

Jan 25 '25 02:01 mah92

So here is the pseudo code.

Create a cache mechanism outside of OfflineTts.

OfflineTtsCacheMechanismConfig config;
// set config

auto cache = new OfflineTtsCacheMechanism(config)

Now pass the pointer to cache to the OfflineTts

OfflineTtsConfig config;

auto tts = new OfflineTts(config, cache);

Now Change OfflineTtsImpl

class OfflineTtsImpl {
public:
  OfflineTtsImpl(const OfflineTtsConfig& config, OfflineTtsCacheMechanism* cache)
   : ...
     cache_(cache) {
   ...
  }

private:
  OfflineTtsCacheMechanism *cache_ = nullptr; // not owned here
};

Note again in the above code, you don't need to add any new methods (except the constructor) of OfflineTts.

You also don't need to care about that impl_ is private in OfflineTts.

You just need to wrap OfflineTtsCacheMechanism and OfflineTtsCacheMechanismConfig. to JNI.

Please don't add any method related to OfflineTtsCacheMechanism to OfflineTts. You don't need to do that.

Jan 25 '25 13:01 csukuangfj

I will catch up soon. By the way, What files needs change in the 1. and 2. part of pseudo code change comment?

Jan 31 '25 07:01 mah92

What files needs change in the 1. and 2. part of pseudo code change comment?

Create a cache mechanism outside of OfflineTts.

You can first wrap the cache mechanism class to kotlin via jni. Then you can create it in kotlin, like how you create offline tts in kotlin.

Now pass the pointer to cache to the OfflineTts

You need to create offline tts in kotlin, which means you need to pass the instance of cache mechanism from kotlin.

Feb 03 '25 14:02 csukuangfj

Hi again. I have progressed, but need some help.

"You just need to wrap OfflineTtsCacheMechanism and OfflineTtsCacheMechanismConfig. to JNI." What do you mean to wrap them? You mean to reimplement them? Which functions should be external? Would you give me pseudo code?
Should I cache "cache" as static in jni/offline-tts.cc? the cache defined in kotlin or the c++ one?
Is the changes I have made to offline-tts-impl OK?
I see the warning "This branch must not contain merge commits.". Should I rebase the whole branch?

Would you please take a look at the changes in my branch? I am confused...

Feb 16 '25 20:02 mah92

By the way, I have made a working thread for cleanup based on user feedback on delays introduced by caching. I will test it more...

Feb 17 '25 03:02 mah92

I think the remaining work is not sth that I can do without your elaborate help. maybe It takes less time from you...

Feb 17 '25 04:02 mah92

sherpa-onnx sherpa-onnx copied to clipboard

Add cache mechanism to sherpa tts

sherpa-onnx
sherpa-onnx copied to clipboard