WhisperFusion icon indicating copy to clipboard operation
WhisperFusion copied to clipboard

Indentation Bug in `trt_server.py`

Open DamianB-BitFlipper opened this issue 5 months ago • 5 comments

In the file trt_server.py I suspect that the highlighted lines need to be in the same indentation level as the while loop. Otherwise, in its current form it makes no sense to me. Just shining some light on this.

DamianB-BitFlipper avatar Feb 01 '24 16:02 DamianB-BitFlipper

Not really, because we want to only send one response to the client, at some point we were sending all the responses we add to the llm_queue for all updates in the current segment from whisper-live but then we decided to send only the one which corresponds to the transcription with eos=True.

That said, https://github.com/collabora/WhisperFusion/blob/main/whisper_live/trt_server.py#L340-L343 this if could be at the same level as the outer if and everything should be fine.

makaveli10 avatar Feb 01 '24 17:02 makaveli10

Thanks for your reply! I understand the logic to only send those responses with eos. But could there not be a backlog in the llm_queue such that there are multiple sentences. Where the first one has an EOS and then begins the other with its own EOS. In the current implementation, the first sentence would be lost.

DamianB-BitFlipper avatar Feb 01 '24 17:02 DamianB-BitFlipper

@DamianB-BitFlipper not sure i understand what you mean when you say sentences, there are llm_response which could be multiple sentences or a single word.

In the current implementation, the first sentence would be lost.

Can you please give an example if you have seen this?

makaveli10 avatar Feb 02 '24 05:02 makaveli10

I wouldn't expect to see this in most cases in practice because the llm_response queue would empty rather quickly. I am just postulating, from exploring the code and poking at it, that the transcriber sends: [<first sentence, eos=True>, <second sentence here, eos=True>], the way the code is written, the first sentence is lost.

I am aware that the transcriber does not put eos=True at the end of sentences, but rather at prolonged pauses of non-voice input. I am using sentence here as an example purely.

DamianB-BitFlipper avatar Feb 02 '24 10:02 DamianB-BitFlipper

I am just postulating, from exploring the code and poking at it, that the transcriber sends: [<first sentence, eos=True>, <second sentence here, eos=True>], the way the code is written, the first sentence is lost.

@DamianB-BitFlipper Okay, so we should never reach this state for a short exchange conversation i.e. we transcribe until EOS=true and at that time the llm_queue should be

[{output1, eos=False}, ..., {outputn, eos=True}]

We only care about outputn at this moment, because that is the most recent llm_output corresponding to the most updated transcription. Not sure why you would want output1

makaveli10 avatar Feb 05 '24 18:02 makaveli10