amazon-chime-sdk-js icon indicating copy to clipboard operation
amazon-chime-sdk-js copied to clipboard

Chime real time transcription - identify language

Open rfic opened this issue 3 years ago • 3 comments

What are you trying to do?

I set up a meeting and used real-time transcription. Speakers use different languages. So, I configure the transcript engine to identify the language. To my surprise, the results are poor. The vast majority of the content is in the default language. When I use the transcription service on its own, the results are radically better.

Perhaps, Chime split the sound into small chunks, and the transcription service could not detect the correct language.

I am looking for a solution.

How can the documentation be improved to help your use case?

For sure, documentation Guide is out of date.

What documentation have you looked at so far?

I scan Developer Guide, Github and lots of websites.

rfic avatar Jul 13 '22 18:07 rfic

Hi rfic@, when we send audio to Amazon Transcribe, we are breaking up the audio into chunks since we are using their streaming API. So that chunking is expected; however, we don't expect necessarily any worse performance compared to passing directly to Amazon Transcribe's streaming API.

A few questions to help us understand the problem better: When you refer to using the transcription service on its own, are you referring to using the Amazon Transcribe streaming APIs or the batch APIs? In addition, do you have a specific audio example to reproduce the poor results that you are seeing?

richhx avatar Jul 15 '22 00:07 richhx

To add to @richhx's comment, underlying Transcribe streaming API service which uses language identification feature will only be able to detect dominant language of the stream. You can find more information in the public doc . That is, if multiple languages are present in the audio stream, Transcribe Streaming will detect the first dominant language in the meeting and not change during the course of the meeting [unless you stop and re-start Transcription with the settings].

As of right now, Transcribe streaming API can only detect single/dominant language in the stream.

Are you referring to multi-language identification of Transcribe batch API as per public doc/API doc

Chitz avatar Jul 19 '22 17:07 Chitz

Hello @richhx, @Chitz thanks for comments.

I am using "Amazon Chime SDK" Media service which is connected directly to Amazon Transcribe. My project is based on Amazon Chime SDK for Telemedicine. I made some updates in lambda function to start transcription.

My current configuration:

const command = new StartMeetingTranscriptionCommand({
      MeetingId: meetingId,
      TranscriptionConfiguration: {
        EngineTranscribeSettings: {
          LanguageCode: 'en-US',
          IdentifyLanguage: true,
          LanguageOptions: "en-US,de-DE",
          Region: 'us-east-1'
        }
      }
    });

A few questions to help us understand the problem better: When you refer to using the transcription service on its own, are you referring to using the Amazon Transcribe streaming APIs or the batch APIs? In addition, do you have a specific audio example to reproduce the poor results that you are seeing?

Yes when I use streaming APIs ale have good results. I can try to prepare an example file.

To add to @richhx's comment, underlying Transcribe streaming API service which uses language identification feature will only be able to detect dominant language of the stream. You can find more information in the public doc . That is, if multiple languages are present in the audio stream, Transcribe Streaming will detect the first dominant language in the meeting and not change during the course of the meeting [unless you stop and re-start Transcription with the settings].

It is a good clue why am i getting such results. I will try to dig more about it...

rfic avatar Jul 20 '22 16:07 rfic

Hey, I think I found where was my issue. I used the old chime namespace where language identification is not available. Here is the article on how to migrate to a new namespace link

rfic avatar Aug 17 '22 08:08 rfic

Hi all. Let me jump in ;)

I've just started to discover Amazon Chime and I'm wondering if it is possible to setup transcribe language per attendee explicitly (attendee set his language by himself) or only the way is automatic detection? What if the attendee's language is not supported by Amazon Transcribe yet? Can I use third-party transcribe service (GCP, for instance)?

The next question is is it possible to preprocess audio stream before it will be delivered to a listener? I mean, to change voice, add some effects or sth else.

Thanks

dobeerman avatar Sep 12 '22 12:09 dobeerman