openhab-core icon indicating copy to clipboard operation
openhab-core copied to clipboard

[tts] Cache mechanism

Open dalgwen opened this issue 3 years ago • 0 comments
trafficstars

Implements a cache mechanism for all TTS services.

Reason :

Online TTS service can be costly, and reducing call to the cloud is always good. It will also improve user experience (less latency for local services) Amazon Polly TTS and Google TTS both implement their own mechanism, and I thought that it could be interesting to mutualize on the same code base (and for other services as well)

Functional specification :

Eviction policy is LRU mode. Cache size is a voice bundle parameter (10 mb default) You can enable or disable this, system wide, or by TTS service Default mode is cache enabled. I'm wondering if it should be switched to default off, at least for the beginning, what do you think ? It doesn't wait for the stream to end and can serve data as soon as a small bunch is available (10kb). A side effect functionnality : this cache can serve several streams concurrently, for the same utterance, with only one call to the TTS.

Technical :

TTS service can disable this by a hard coded value in the TTS service implementation. (Default method in the interface returns true=enable)

A LRU cache (TTSLRUCacheImpl) implementing a simple interface (TTSCache with a getOrSynthetize method). This class use a double linked list with head and tail, and a hashmap (a rather classical LRU implementation)

The TTS AudioStream result is provided by a supplier (AudioStreamSupplier, which can delay call to the TTS service, thus allowing the cache service to not wait during the cached entry creation).

The TTS Results are created when calling the TTS service for the first time, or loaded from the disk at startup. An "info" file is stored alongside the sound file and contains the AudioFormat information.

The TTS Result object provides an audio stream wrapper implementation (AudioStreamCacheWrapper) which is send to the sink responsible for playing. This wrapper override the read() method to call the TTS Result transparently (which will provide data from disk). It allows several clients to request the same TTSResult without waiting, each with their own AudioStream.

A fallback mechanism is implemented (if the cache mechanism failed for whatever reason, then the TTS is directly called)

Closes #3039

Signed-off-by: Gwendal Roulleau [email protected]

dalgwen avatar Aug 08 '22 15:08 dalgwen