openai-python [Audio.transcribe] Logprobs for each token in verbose

Describe the feature or improvement you're requesting

Currently, Whisper exposes avg_logprob for an entire segment. The request is to expose logprobs for each token.

{
  "duration": 4.01,
  "language": "english",
  "segments": [
    {
      "avg_logprob": -0.40153955010806813,
      "compression_ratio": 1.0526315789473684,
      "end": 4.0,
      "id": 0,
      "no_speech_prob": 0.1633709967136383,
      "seek": 0,
      "start": 0.0,
      "temperature": 0.0,
      "text": " Testing, testing, this is going to be a new audio recording.",
      "tokens": [
        50364,
        45517,
        11,
        4997,
        11,
        341,
        307,
        516,
        281,
        312,
        257,
        777,
        6278,
        6613,
        13,
        50564
      ],
      "transient": false
    }
  ],
  "task": "transcribe",
  "text": "Testing, testing, this is going to be a new audio recording."
}

Additional context

No response

Mar 01 '23 22:03 sheikheddy

This is due to a limitation in Whisper. I skimmed through https://github.com/openai/whisper/blob/7858aa9c08d98f75575035ecd6481f462d66ca27/whisper/decoding.py#L110 and the good news is that it doesn't seem like it would be that hard to change. Key line of code is this:

logprobs = F.log_softmax(logits.float(), dim=-1)

The main modification you'd need to do would be adding token probabilities to https://github.com/openai/whisper/blob/7858aa9c08d98f75575035ecd6481f462d66ca27/whisper/transcribe.py#L23 where currently only avg_logprob is included.

Mar 02 '23 05:03 sheikheddy

Hey! I flagged this to the team, I am going to close for now since this repo is for the Python SDK not API feedback but will follow up if we end up adding this.

Mar 03 '23 16:03 logankilpatrick

[Audio.transcribe] Logprobs for each token in verbose_json

Describe the feature or improvement you're requesting

Additional context