react-transcript-editor icon indicating copy to clipboard operation
react-transcript-editor copied to clipboard

Support for Amazon Transcribe format for multiple channels (not the same as speakers)

Open gittes opened this issue 4 years ago • 2 comments

Found your wonderful software, but had minor issue when loading an Amazon Transcribe transcript that had the variant format for independent audio channels as oppose to the typical speakers format.

Impressively, your software still loaded the rows of the transcript correctly, however, it made every speaker label have a unique number suffix, so it was impossible to relabel the speaker labels all at once and almost insurmountable task to track and correct by hand a very long transcript.

It's used when each speakers are each on a dedicated channel/track in the source audio file: https://docs.aws.amazon.com/transcribe/latest/dg/how-channel-id.html Excerpt from referred AWS doc showing the JSON format:

{
  "jobName": "job id",
  "accountId": "account id",
  "results": {
    "transcripts": [
      {
        "transcript": "When you try ... It seems to ..."
      }
    ],
    "channel_labels": {
      "channels": [
        {
          "channel_label": "ch_0",
          "items": [
            {
              "start_time": "12.282",
              "end_time": "12.592",
              "alternatives": [
                {
                  "confidence": "1.0000",
                  "content": "When"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.592",
              "end_time": "12.692",
              "alternatives": [
                {
                  "confidence": "0.8787",
                  "content": "you"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.702",
              "end_time": "13.252",
              "alternatives": [
                {
                  "confidence": "0.8318",
                  "content": "try"
                }
              ],
              "type": "pronunciation"
            },
            Transcription abbreviated
         ]
      },
      {
          "channel_label": "ch_1",
          "items": [
            {
              "start_time": "12.379",
              "end_time": "12.589",
              "alternatives": [
                {
                  "confidence": "0.5645",
                  "content": "It"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.599",
              "end_time": "12.659",
              "alternatives": [
                {
                  "confidence": "0.2907",
                  "content": "seems"
                }
              ],
              "type": "pronunciation"
            },
            {
              "start_time": "12.669",
              "end_time": "13.029",
              "alternatives": [
                {
                  "confidence": "0.2497",
                  "content": "to"
                }
              ],
              "type": "pronunciation"
            },
            Transcription abbreviated
        ]
    }
}

It has "channel_labels" (object) -> "channels" (array/list) ->"channel" (object) with each channel containing it's own "items" for words oppose to "items" being declared once in the other format and uses "channel_label" instead of "speaker_label" for speakers.

Could you please accommodate the Amazon Transcribe channel format variant and at least have speaker ID labels be consistent per channel if not matching the "channel_label?"

Just for reference here's the doc for speaker identification format: https://docs.aws.amazon.com/transcribe/latest/dg/how-diarization.html

gittes avatar May 18 '20 21:05 gittes

Hi @gittes Thanks for flagging this!

The AWS adapter, same as many of the other adapters have been made thanks to community OS contributions. See PR https://github.com/bbc/react-transcript-editor/pull/120

Remove the incremental counter

to remove the incremental counter this line should be changed packages/stt-adapters/amazon-transcribe/index.js#L140

-speaker: paragraph.speaker ? `Speaker ${ paragraph.speaker }` : `TBC ${ i }`,
+speaker: paragraph.speaker ? `Speaker ${ paragraph.speaker }` : `U_UKN`,

Doesn't have to be U_UKN but for STT services that returns speaker diarization infos sometimes it might look something like M_1 or F_2 etc... (eg using speechmatics)

AWS Adapter

There's a guide on how to make one from scratch under docs/guides/adapters.md for context and the code for the existing AWS one is at packages/stt-adapters/amazon-transcribe

AWS 2 channels json format

To accommodate that it be a matter of modifying the AWS STT Adapter in a way that

  • keeps compatibility with other AWS STT format
  • is able to distinguish between the two and uses the correct one
  • if speaker diarization info is available uses those, otherwise fallback to a default

Don't want to speak for @jamesdools and @emettely but I am guessing a PR would be welcome, if you got the time/capacity?


As a side note, at the moment I am mostly working on this alternative version pietrop/slate-transcript-editor. It doesn't provide any adapters as part of the core components, but I've extracted some of the adapters from this module, eg pietrop/aws-to-dpe, pietrop/gcp-to-dpe for when that type of conversion might be needed, eg working with AWS STT, or Google STT.

pietrop avatar May 18 '20 21:05 pietrop

@gittes - hi! Thanks for sending us a request to improve the adapters - we would be ecstatic if you could help us out to add that compatibility, based on the information that @pietrop mentioned above. We would be happy to review it / merge.

emettely avatar May 19 '20 05:05 emettely