openhab-core icon indicating copy to clipboard operation
openhab-core copied to clipboard

Implement a cache mecanism in the voice manager

Open dalgwen opened this issue 3 years ago • 10 comments

Hello all,

I have an ongoing pull request for a TTS service (Mimic) (edit : pull request accepted). In the thread, I suggested to @lolodomo that a cache system, that all TTS services could use, could be a good idea. He suggested, thanks to him, to open an issue here to discuss it.

If it is deemed a good idea, I think I could work on it and offer a pull request.

There is already two cache systems (for Amazon poly and Google TTS), and they are pretty similar. I think a generic one could replace them and provide the same functionnality for the others.

Some things to think about :

  • A LRU cache with, as an eviction parameter, a limit to the size of all the cached sound ?
  • The key should be a md5sum (or other) from the voice id + the text + optionnaly other parameters (maybe the key definition could be overriden by the TTS service ?). A text file alongside the sound file could contain the text and parameters to make further validation (as it is already the case in the poly and google tts)
  • The cache should be opt-out (the TTS service could refuse it, but otherwise it is automatically used)
  • The cache MUST respect the stream pattern : it shouldn't be allowed to wait for the entire file and copy it to disk before sending a new stream to the calling script. It must provide a stream of byte as soon as possible (i.e. caching and playing at the same time)
  • The cache should be invalidated if the TTS service asks it. For example if the TTS service changes its parameter (speed, etc.)
  • The cache should be able to handle more complex use case than the simple "done -> record to disk -> get it next time". For example, if an openHAB script call the say command simultaneously in several threads for several audio sinks across the house, it should make only one call, and serves a stream to every sink asking for it in parallel. It could require the cache to be a two level cache, (one in memory, and one on files, as reading and writing file in java at the same time is not a good idea).
  • to be completed

Thank you for reading and for your comments

dalgwen avatar Jul 11 '22 23:07 dalgwen

In my opinion, we should have an additional say method with a new boolean parameter to use or not the cache.

https://github.com/openhab/openhab-core/blob/main/bundles/org.openhab.core.voice/src/main/java/org/openhab/core/voice/internal/VoiceManagerImpl.java#L200

Of course, this new parameter should be exposed in REST API and rule engines,

This option (cache or not) should not be per TTS service.

Regarding caching audio in memory, take care, OH is often run on machine with small amount of memory (like RPI) so this should be an option, disabled by default. For the same reason, caching should be disabled per default.

Note that there is also a cache mechanism in the voices TTS service.

lolodomo avatar Jul 12 '22 06:07 lolodomo

Thank you for the comments,

Regarding caching audio in memory, take care, OH is often run on machine with small amount of memory (like RPI) so this should be an option, disabled by default.

I would try to avoid it (maybe with the help of the JavaRandomAccessFile which allows concurrent read and write of a file), but if I really need it, I was thinking about something very small. 1mb, immediately deleted after the completion of the file. Its purpose is just to serve streams concurrently. In fact, many TTS service are already using something like that (the files downloaded are often keeped as a whole in memory and send to the sink with a ByteArrayInputStream, which I think is a bad practice and I plan to change it in the Mimic TTS service) If it is this small, I think that should be OK, what do you think ?

For the same reason, caching should be disabled per default.

Don't you think many users won't know and won't use it ? IMHO, even some very small cache could be extremely usefull. A 10 mb cache on file could store easily one hundred short sentences. I could also adapt it to the available disk space (if the available space is low, then caching is disabled) In this form, would it be OK to make it enabled by default ?

This option (cache or not) should not be per TTS service.

I was thinking about a isCacheEnabled method in the TTSService interface, which returns a boolean saying if the cache mecanism is suited for this specific TTS service (it should be a hardcoded value, for the developer). Are we talking about the same thing ? If so, why do you think this shouldn't be per TTS service ?

dalgwen avatar Jul 13 '22 10:07 dalgwen

If it is this small, I think that should be OK, what do you think ?

Yes of course if small.

Don't you think many users won't know and won't use it ? IMHO, even some very small cache could be extremely usefull. A 10 mb cache on file could store easily one hundred short sentences. I could also adapt it to the available disk space (if the available space is low, then caching is disabled) In this form, would it be OK to make it enabled by default ?

Would it be possible to limit the cache size (with a setting) ? And define what to do when the cache is full and a new "say" is called ? If we could control the cache size, I think it would be OK to enable it by default.

I was thinking about a isCacheEnabled method in the TTSService interface, which returns a boolean saying if the cache mecanism is suited for this specific TTS service (it should be a hardcoded value, for the developer). Are we talking about the same thing ? If so, why do you think this shouldn't be per TTS service ?

Ok, that is fine. It could also be used (returning false) if the TTS service already implemented its cache mechanism.

lolodomo avatar Jul 16 '22 07:07 lolodomo

What we have here is an issue that we observed several times in the past: The same functionality is needed by different add-ons and it would be very good to have some sort of "shared code". Another example of this is the code that handles different channel-types in the HTTP and MQTT binding.

I don't like duplicated code, so I'm all for adding code as that usually comes with good test coverage and the same bug can easily be fixed for all add-ons using that code.

I'm not sure we should add such very specific code to core as this contradicts the idea of a small, extensible core (look e.g. at #3041 where a reduction in core-functionality is requested). I think that a cache that is "per TTS service" should not be part of core, it would be much better to have a org.openhab.voice.commons bundle that is consumed by the other voice bundles and contains the implementation for such a cache. Unfortunately there is a policy that prevents dependencies between add-on bundles.

If such a cache is implemented it should have a clear exposed interface and a fully separated implementation. We should avoid mixing API and implementation, this always brings us into big trouble later. Bad examples are the AbstractWatchService or the AbstractFileTransformationService which have a very tight coupling between internal implementation and exposed API.

J-N-K avatar Jul 16 '22 08:07 J-N-K

Thank you again @lolodomo and @J-N-K I made a first draft with your remarks in mind and will post a PR very soon.

Would it be possible to limit the cache size (with a setting) ? And define what to do when the cache is full and a new "say" is called ?

Yes exactly, it was my intention. I implemented this (10 mb default). If the cache is full, the eviction policy is LRU (the Least Recently Used will be deleted). And if the free space (at startup) is not twice the requested cache size, the cache is disabled.

Ok, that is fine. It could also be used (returning false) if the TTS service already implemented its cache mechanism.

I made a default method in the TTSService interface, returning true to enable the caching mechanism. I chose a default method to not introduce a breaking change in the TTSService implementation addon. And as you said, the TTS service already implementing the cache mechanism could override it to return false. 1- Is "true" the right default parameter? 2- If a default method is not OK, I can make a PR to add to all the TTS addon the proper modification.

I think that a cache that is "per TTS service" should not be part of core,

I also added a side-effect functionnality behind the cache system : The ability to serve a unique TTS result stream to several clients simultaneously, (of course, without waiting for the original stream to be fully cached). I think it is an interesting first step torward a use case that I think could be needed : the ability to "say" something across several sinks simultaneously in the house. So this could be a nice addition/justification for the core ?

dalgwen avatar Aug 08 '22 15:08 dalgwen

Of course, this new parameter should be exposed in REST API and rule engines,

I forgot to answer this : I have trouble to imagine a need for this parameter, as it would be a per request parameter. IMO, it should be activated or not (it works, or not), but choosing to do so per utterance seems overkill. Do you have a use case in mind for me to understand ?

dalgwen avatar Aug 08 '22 15:08 dalgwen

I forgot to answer this : I have trouble to imagine a need for this parameter, as it would be a per request parameter. IMO, it should be activated or not (it works, or not), but choosing to do so per utterance seems overkill. Do you have a use case in mind for me to understand ?

Sometimes, the user requests to say something that has no sense to put in a cache, like for example saying the current hour. It could be cool to give the user the ability to give this indication when he calls "say". This would be an additional parameter to bypass the cache. By default, cache would be use. WDYT ?

lolodomo avatar Aug 10 '22 08:08 lolodomo

You are right, I didn't think about this. I will try to make a commit on the branch for a console command and for the rest service (maybe at the end of next week).

dalgwen avatar Aug 10 '22 08:08 dalgwen

You are right, I didn't think about this. I will try to make a commit on the branch for a console command and for the rest service (maybe at the end of next week).

What would be the most important is the rule action. Console command is just used for testing.

This could be done in a next (separate) PR later. We could consider that feature as a bonus.

lolodomo avatar Aug 10 '22 09:08 lolodomo

Hello @lolodomo,

I made additional commits on this branch.

One adds the enableCache parameter when using rule action in GUI. Another one adds the enableCache in the DSL say action. And another one for the rest service.

I don't know where to modify the code for the console command ? Beside, do you know how to use the console in the development environnement ? I have the feeling that it is not enabled by default, am I right ?

For the record, here is the rule I choose to establish if the cache is enabled for the current action (another commit) :

  • The parameter in the DSL rule or in the GUI or in the rest service, if defined, is prioritary.
  • If the parameter is not filled, then the global parameter (The voice bundle parameter configuration) AND the TTS implementation method have to be enable, to enable the cache WDYT ?

dalgwen avatar Sep 04 '22 07:09 dalgwen