openai-node
openai-node copied to clipboard
Typescript - incorrect type when using verbose_json as the whisper transcription response_format
Confirm this is a Node library issue and not an underlying OpenAI API issue
- [X] This is an issue with the Node library
Describe the bug
With whisper, while using the verbose_json response_format parameter, the audio.transcriptions.create returns a type Transcription, which does not include the extra details from verbose_json
To Reproduce
See the code snippet
Code snippets
const response = await openAIClient.audio.transcriptions.create({
model: 'whisper-1',
file: fileStream,
response_format: 'verbose_json',
timestamp_granularities: ['segment']
})
console.log(response.text)
// @ts-ignore
console.log(response['language']) // language isn't part of the Transcription interface
OS
macOS
Node version
18.16.1
Library version
openai v4.28.4
We hope to add support for this in the coming months.
I just ran this today and I get this back from the response includes langauge:
const transcription = await this.client.audio.transcriptions.create({
file: fs.createReadStream(tempFileName),
response_format: 'verbose_json',
model: 'whisper-1'
});
{
task: "transcribe",
language: "english",
duration: 2.0399999618530273,
text: "Hello World and all the bunnies!",
segments: [
{
id: 0,
seek: 0,
start: 0,
end: 2,
text: " Hello World and all the bunnies!",
tokens: [ 50364, 2425, 3937, 293, 439, 264, 6702, 40549, 0, 50464 ],
temperature: 0,
avg_logprob: -0.5200682878494263,
compression_ratio: 0.8421052694320679,
no_speech_prob: 0.017731403931975365,
}
],
}
I'm piggy backing off this issue, I did notice that if I set return type to 'text' the return type is expected is a Transcription type object, but I receive a simple string. This is fine but it confuses TypeScript I have to do:
return result as unknown as string;
Question: are more tokens required/credits used if I request 'verbose_json' vs 'text'?
Hello everyone,
I encountered the same issue regarding the return object. As a temporary workaround in my project, I added an interface based on the documentation to better handle the function's return.
Another approach could be to use a fork of the project and implement this fix. However, this always necessitates staying vigilant for possible updates and conflicts from the original repository.
To address this, I've submitted a Pull Request with the correction, hoping the maintainers will integrate this fix. Let's wait and see.
@dereckmezquita Regarding the question of whether the cost differs depending on the type of return, I'm not certain, but I believe it does not. The billing is based on the generation of data counted by tokens, not the size of the response. What the documentation makes clear is that requesting the "verbose_json" response in "words" segments increases latency.
Hello,
Similar issue when you use the verbose_json response format, I get this error Property duration does not exist on type Transcription even if the field duration is in the returned JSON.
I am using openai version 4.53.1.
Thanks for the report and sorry for the delay, this will be fixed in the next release! https://github.com/openai/openai-node/pull/1103