RealtimeTTS icon indicating copy to clipboard operation
RealtimeTTS copied to clipboard

After upgrade TextToAudioStream.is_playing() isn't behaving as expected anymore

Open manuelseeger opened this issue 7 months ago • 2 comments

I find is_playing() is now returning always False for me. It used to reliably return True while audio was being read out.

I run this simple test:

from time import sleep

from RealtimeTTS import KokoroEngine, TextToAudioStream

engine = KokoroEngine()
tts = TextToAudioStream(engine)

text = "Hello, this is a test of the TTS system. It should read this text aloud."

tts.feed(text)
tts.play_async()

while True:
    print("TTS is playing:", tts.is_playing())
    sleep(1)

And it always returns false.

I did some debugging and found that self.is_playing_flag in TextToAudioStream is true while it's speaking (which from the naming seems expected). However, witht this change here this will always resolve False in is_playing(): https://github.com/KoljaB/RealtimeTTS/blob/22ad82f5c0f70dec618b52919cc4789e80f3da2b/RealtimeTTS/text_to_stream.py#L724

It would be great if is_playing() would again return True as bevor while audio is being streamed out.

manuelseeger avatar May 25 '25 12:05 manuelseeger

Oh, man. I thought I did something wrong in my code, but it was the update. I'm not sure which version I had before, but it was probably from a few months ago. I was refactoring my code to add use Coqui, and pip updated realtimeTTS. It turns out that was pointless since I have a Radeon 6650 XT, and targeting Windows without WSL, so no CUDA and no RocM, so no deepspeed. Without a CUDA GPU, Coqui is not usable for real-time text-to-audio. I rolled back my code to before and nothing worked correctly. I'm using an iterator with feed() and play_async() whenever a new stream of text is available to be spoken. Only the first audio stream would play, then nothing, since my code was depending on is_playing() for synchronization.

I'm using tts_stream.stream_running and tts_stream.is_playing_flag directly now as a workaround in my code. It feels like I'm digging into the internals to get the needed behavior, but .is_playing() doesn't seem to have much use with the and not in there. Thanks for tracking down the reason @manuelseeger ! I hope there is an easy fix for @KoljaB to implement! (I'm guessing it might be as simple as removing not from that return statement on line 724 of text_to_stream.py, but I'm not sure if that has side effects.)

Benzolio avatar May 25 '25 23:05 Benzolio

Oh, man. I thought I did something wrong in my code, but it was the update. I'm not sure which version I had before, but it was probably from a few months ago. I was refactoring my code to add use Coqui, and pip updated realtimeTTS. It turns out that was pointless since I have a Radeon 6650 XT, and targeting Windows without WSL, so no CUDA and no RocM, so no deepspeed. Without a CUDA GPU, Coqui is not usable for real-time text-to-audio. I rolled back my code to before and nothing worked correctly. I'm using an iterator with feed() and play_async() whenever a new stream of text is available to be spoken. Only the first audio stream would play, then nothing, since my code was depending on is_playing() for synchronization.

I'm using tts_stream.stream_running and tts_stream.is_playing_flag directly now as a workaround in my code. It feels like I'm digging into the internals to get the needed behavior, but .is_playing() doesn't seem to have much use with the and not in there. Thanks for tracking down the reason @manuelseeger ! I hope there is an easy fix for @KoljaB to implement! (I'm guessing it might be as simple as removing not from that return statement on line 724 of text_to_stream.py, but I'm not sure if that has side effects.)

How did u make it work in Rocm, can u share the steps so i can run it in mi300x

amal5haji avatar Oct 22 '25 20:10 amal5haji