python-sdks icon indicating copy to clipboard operation
python-sdks copied to clipboard

transcription_received is missing participant info if participants disconnect before it is received

Open jezell opened this issue 1 year ago • 2 comments

If a participant disconnects quickly, the callback won't have participant information because the participant is no longer on the call, which leads to a weird situation where you can't tell who the transcript segment belongs to. Maybe the segment itself needs some participant info?

jezell avatar Dec 15 '24 18:12 jezell

I think this would be difficult for us to coordinate.. if the participant is gone from the room, then it seems strange to have transcription come in that isn't attributed to anyone.

would it be better if we've held off and not fired the event?

davidzhao avatar Jan 18 '25 08:01 davidzhao

From the date this was only a month ago, but man that seems like ages the way AI is moving these days.

If I remember correctly, I think there are two separate issues @davidzhao. The first is that the interface itself only exposes the participant object when really the participant id would be sufficient from a transcript standpoint. The second is that the participant can't be looked up because it is not connected anymore so it gets dropped from the event. If you are saving the full transcript, you likely would be saving the participants as discrete elements rather than embedding them on every segment. It's easy enough to capture participant info for each participant id when room joined events come through. While keeping the full participant object around definitely presents some weird challenges, even just exposing the participant id on the segment itself or in the event would help. If I remember correctly, the participant id is available, but since the participant has connected it fails to look it up and then passes null, which is a bit confusing (maybe the interface shouldn't even have the participant object if it can't be reliably looked up?).

I think the challenge with dropping the transcript is that leads to a loss of information and dropped transcripts. Since the transcripts may be chunked to a few seconds of audio each, you end up missing entire sentences if someone disconnects right after the speak or mid sentence.

jezell avatar Jan 20 '25 04:01 jezell