openai-python icon indicating copy to clipboard operation
openai-python copied to clipboard

[Audio.transcribe] Logprobs for each token in verbose_json

Open sheikheddy opened this issue 2 years ago • 1 comments

Describe the feature or improvement you're requesting

Currently, Whisper exposes avg_logprob for an entire segment. The request is to expose logprobs for each token.

{
  "duration": 4.01,
  "language": "english",
  "segments": [
    {
      "avg_logprob": -0.40153955010806813,
      "compression_ratio": 1.0526315789473684,
      "end": 4.0,
      "id": 0,
      "no_speech_prob": 0.1633709967136383,
      "seek": 0,
      "start": 0.0,
      "temperature": 0.0,
      "text": " Testing, testing, this is going to be a new audio recording.",
      "tokens": [
        50364,
        45517,
        11,
        4997,
        11,
        341,
        307,
        516,
        281,
        312,
        257,
        777,
        6278,
        6613,
        13,
        50564
      ],
      "transient": false
    }
  ],
  "task": "transcribe",
  "text": "Testing, testing, this is going to be a new audio recording."
}

Additional context

No response

sheikheddy avatar Mar 01 '23 22:03 sheikheddy

This is due to a limitation in Whisper. I skimmed through https://github.com/openai/whisper/blob/7858aa9c08d98f75575035ecd6481f462d66ca27/whisper/decoding.py#L110 and the good news is that it doesn't seem like it would be that hard to change. Key line of code is this:

logprobs = F.log_softmax(logits.float(), dim=-1)

The main modification you'd need to do would be adding token probabilities to https://github.com/openai/whisper/blob/7858aa9c08d98f75575035ecd6481f462d66ca27/whisper/transcribe.py#L23 where currently only avg_logprob is included.

sheikheddy avatar Mar 02 '23 05:03 sheikheddy

Hey! I flagged this to the team, I am going to close for now since this repo is for the Python SDK not API feedback but will follow up if we end up adding this.

logankilpatrick avatar Mar 03 '23 16:03 logankilpatrick