openai-node icon indicating copy to clipboard operation
openai-node copied to clipboard

Typescript - incorrect type when using verbose_json as the whisper transcription response_format

Open jessebs opened this issue 1 year ago • 4 comments

Confirm this is a Node library issue and not an underlying OpenAI API issue

  • [X] This is an issue with the Node library

Describe the bug

With whisper, while using the verbose_json response_format parameter, the audio.transcriptions.create returns a type Transcription, which does not include the extra details from verbose_json

To Reproduce

See the code snippet

Code snippets

const response = await openAIClient.audio.transcriptions.create({
  model: 'whisper-1',
  file: fileStream,
  response_format: 'verbose_json',
  timestamp_granularities: ['segment']
})

console.log(response.text)
// @ts-ignore
console.log(response['language']) // language isn't part of the Transcription interface

OS

macOS

Node version

18.16.1

Library version

openai v4.28.4

jessebs avatar Mar 03 '24 05:03 jessebs

We hope to add support for this in the coming months.

rattrayalex avatar Mar 05 '24 04:03 rattrayalex

I just ran this today and I get this back from the response includes langauge:

const transcription = await this.client.audio.transcriptions.create({
    file: fs.createReadStream(tempFileName),
    response_format: 'verbose_json',
    model: 'whisper-1'
});
{
  task: "transcribe",
  language: "english",
  duration: 2.0399999618530273,
  text: "Hello World and all the bunnies!",
  segments: [
    {
      id: 0,
      seek: 0,
      start: 0,
      end: 2,
      text: " Hello World and all the bunnies!",
      tokens: [ 50364, 2425, 3937, 293, 439, 264, 6702, 40549, 0, 50464 ],
      temperature: 0,
      avg_logprob: -0.5200682878494263,
      compression_ratio: 0.8421052694320679,
      no_speech_prob: 0.017731403931975365,
    }
  ],
}

I'm piggy backing off this issue, I did notice that if I set return type to 'text' the return type is expected is a Transcription type object, but I receive a simple string. This is fine but it confuses TypeScript I have to do:

return result as unknown as string;

Question: are more tokens required/credits used if I request 'verbose_json' vs 'text'?

dereckmezquita avatar Mar 16 '24 02:03 dereckmezquita

Hello everyone,

I encountered the same issue regarding the return object. As a temporary workaround in my project, I added an interface based on the documentation to better handle the function's return.

Another approach could be to use a fork of the project and implement this fix. However, this always necessitates staying vigilant for possible updates and conflicts from the original repository.

To address this, I've submitted a Pull Request with the correction, hoping the maintainers will integrate this fix. Let's wait and see.

@dereckmezquita Regarding the question of whether the cost differs depending on the type of return, I'm not certain, but I believe it does not. The billing is based on the generation of data counted by tokens, not the size of the response. What the documentation makes clear is that requesting the "verbose_json" response in "words" segments increases latency.

wrogati avatar Mar 24 '24 16:03 wrogati

Hello, Similar issue when you use the verbose_json response format, I get this error Property duration does not exist on type Transcription even if the field duration is in the returned JSON. I am using openai version 4.53.1.

eguenou avatar Jul 28 '24 13:07 eguenou

Thanks for the report and sorry for the delay, this will be fixed in the next release! https://github.com/openai/openai-node/pull/1103

RobertCraigie avatar Sep 27 '24 23:09 RobertCraigie