openai-node icon indicating copy to clipboard operation
openai-node copied to clipboard

[Whisper] cannot call `createTranscription` function from Node.js due to File API

Open tmgauss opened this issue 2 years ago • 9 comments

Describe the bug

Cannot call createTranscription function like below:

...
const audio = await fs.readFile('path/to/audio.mp4');
// Compile Error at the first argument
const response = await openai.createTranscription(audio, 'whisper-1');

This is because createTranscription interface asks me for File API, which is mainly for Browser API.

public createTranscription(file: File, model: string, prompt?: string, responseFormat?: string, temperature?: number, language?: string, options?: AxiosRequestConfig) {
  return OpenAIApiFp(this.configuration).createTranscription(file, model, prompt, responseFormat, temperature, language, options).then((request) => request(this.axios, this.basePath));
}

How can I use this function from Node.js? Thanks!


Node.js version: v18.14.2
MacOS Monterey

To Reproduce

...
const audio = await fs.readFile('path/to/audio.mp4');
// Compile Error at the first argument
const response = await openai.createTranscription(audio, 'whisper-1');

Code snippets

No response

OS

MacOS

Node version

Node v18.14.2

Library version

openai v3.2.1

tmgauss avatar Mar 02 '23 10:03 tmgauss

Add support for loading file from Blob, Stream or base64 encoded string.

rmtuckerphx avatar Mar 02 '23 12:03 rmtuckerphx

+1 for support for other formats, as @rmtuckerphx mentioned, especially for those migrating from Google's Speech-to-text (base64)

estevanmaito avatar Mar 02 '23 17:03 estevanmaito

+1 please support other formats, I'm surprised this was overlooked

Pckool avatar Mar 02 '23 18:03 Pckool

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio.m4a') as any, 'whisper-1');

chrg1001 avatar Mar 03 '23 01:03 chrg1001

The "Readable" stream doesn't work either, so if your file is in memory, the only way to upload it seems to be by writing to the disk first and then using createReadStream.

zlenner avatar Mar 03 '23 01:03 zlenner

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio') as any, 'whisper-1');

Thanks for the workaround. I am seeing a 400 response when I use this approach. The response indicates Invalid file format. The file format is webm and the same file works with curl.

Edit:

Apologies for the spam. In my case, I had to rename /path/to/audio so that it included the file extension like /path/to/audio.webm

nrempel avatar Mar 05 '23 15:03 nrempel

It seems OpenAI is hacking around MIME type discovery for the input audio file by using the .path property of a stream - it's present on fs.createReadStream() by default, which is why that works and Readable.from() does not.

You can do this:

const audioReadStream = Readable.from(audioBuffer);
audioReadStream.path = 'conversation.wav';
const {data: {text}} = await openai.createTranscription(audioReadStream, 'whisper-1');

jacoblee93 avatar Mar 05 '23 23:03 jacoblee93

It seems OpenAI is hacking around MIME type discovery for the input audio file by using the .path property of a stream - it's present on fs.createReadStream() by default, which is why that works and Readable.from() does not.

You can do this:

const audioReadStream = Readable.from(audioBuffer);
audioReadStream.path = 'conversation.wav';
const {data: {text}} = await openai.createTranscription(audioReadStream, 'whisper-1');

As a note for anyone using TypeScript, you should use @ts-expect-error` here incase this is ever fixed and you can update the functionality.

ryandotelliott avatar Mar 10 '23 02:03 ryandotelliott

Safari records audio as audio/mp4 when using javascript's MediaRecoder. And it doesn't seem like it is possible to trick openai with the stream.path technique above because the file would need to be converted

Does anyone have any ideas of how to sort this without converting the file?

AlexNeep avatar Mar 16 '23 20:03 AlexNeep

Safari records audio as audio/mp4 when using javascript's MediaRecoder. And it doesn't seem like it is possible to trick openai with the stream.path technique above because the file would need to be converted

Does anyone have any ideas of how to sort this without converting the file?

I'm also facing this issue. I was able to convert from mp4 to mp3 using ffmpeg, but this isn't an ideal solution and I'm hoping the API will be fixed.

jakowenko avatar Mar 20 '23 13:03 jakowenko

I cannot get this API to work without writing every file to disk, which I very much do not want to do. Pretty big oversight, hoping for a fix.

ctb248 avatar Mar 23 '23 23:03 ctb248

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio') as any, 'whisper-1');

Thanks for the workaround. I am seeing a 400 response when I use this approach. The response indicates Invalid file format. The file format is webm and the same file works with curl.

Edit:

Apologies for the spam. In my case, I had to rename /path/to/audio so that it included the file extension like /path/to/audio.webm

this solution works

tylim88 avatar Mar 24 '23 23:03 tylim88

I cannot get this API to work without writing every file to disk, which I very much do not want to do. Pretty big oversight, hoping for a fix.

Here's a streaming example that might help?

I'm piping an http .ogg audio request stream into ffmpeg to convert it into an mp3 audio stream which I pipe into an OpenAI transcription request.

import {spawn} from 'child_process'
import {Readable} from 'stream'

async function transcribe(input: Readable) {
    // Converting .ogg to .mp3
    const proc = spawn('ffmpeg', ['-f', 'ogg', '-i', '-', '-f', 'mp3', '-'])
    input.pipe(proc.stdin)
    proc.stdout.path = 'upload.mp3' // Necessary to quack like a file upload
    const result = await openai.createTranscription(proc.stdout, 'whisper-1')
    return result.data.text
}

async function example() {
    const response = await fetch('http://example.com/audio.ogg')
    const nodeStream = Readable.fromWeb(response.body)
    const transcription = await transcribe(nodeStream)
    console.log('the audio file said: ', transcription)
}

danneu avatar Mar 25 '23 22:03 danneu

I'm trying to use the default NodeJS implementation described at https://platform.openai.com/docs/api-reference/audio/create?lang=node and ever got 400 error when try to transcribe any MP4.

Tryed all solutions on this thread, nothing works.

AnibalDuarte avatar Mar 28 '23 22:03 AnibalDuarte

I cannot get this API to work without writing every file to disk, which I very much do not want to do. Pretty big oversight, hoping for a fix.

Here's a streaming example that might help?

I'm piping an http .ogg audio request stream into ffmpeg to convert it into an mp3 audio stream which I pipe into an OpenAI transcription request.

import {spawn} from 'child_process'
import {Readable} from 'stream'

async function transcribe(input: Readable) {
    // Converting .ogg to .mp3
    const proc = spawn('ffmpeg', ['-f', 'ogg', '-i', '-', '-f', 'mp3', '-'])
    input.pipe(proc.stdin)
    proc.stdout.path = 'upload.mp3' // Necessary to quack like a file upload
    const result = await openai.createTranscription(proc.stdout, 'whisper-1')
    return result.data.text
}

async function example() {
    const response = await fetch('http://example.com/audio.ogg')
    const nodeStream = Readable.fromWeb(response.body)
    const transcription = await transcribe(nodeStream)
    console.log('the audio file said: ', transcription)
}

I gave in and just wrote the files to disk, yet it STILL throws an invalid format error with a totally functional .wav/mp3/etc.. I was reading that there's some kind of encoding problem with the version of ffmpeg they're using behind the scenes. So it seems even this may or may not work depending on which version/format/codec you're using. At this point it's probably easier to just Dockerize a local instance of whisper.cpp. Maybe they should have ChatGPT help fix this API 😀

ctb248 avatar Mar 28 '23 23:03 ctb248

I've fixed it using this:

module.exports = chatGptModel = { async transcribe(filename) { const inputFile = fs.createReadStream(filename); const outputPath = filename.replace('.ogg','.mp3'); const outputStream = fs.createWriteStream(outputPath); const convertion = spawn("ffmpeg", ["-i","pipe:0","-f","mp3","pipe:1",]); inputFile.pipe(convertion.stdin);3 convertion.stdout.pipe(outputStream); return new Promise((resolve, reject) => { convertion.on('end', () => { resolve(filename.replace('.ogg','.mp3')) }) convertion.on('error', () => { reject() }) }) }, async transcriptAudio(filename,user){ if(!filename.includes('.mp3')){ filename = await chatGptModel.transcribe(filename) }
try { const resp = await openai.createTranscription( fs.createReadStream(filename), "whisper-1" ); return res.data.text; } catch(e){ console.log('Transcription error on '+filename,e); return 'Ocorreu um erro durante a transcrição.' }
} }

AnibalDuarte avatar Mar 29 '23 03:03 AnibalDuarte

That API is such a burden to use today.

I came up with a solution to avoid storing the file on the server.


const axios = require("axios");
const ffmpeg = require("fluent-ffmpeg");
const { Readable, Writable } = require("stream");
const fs = require("fs");
const { Configuration, OpenAIApi } = require("openai");


async function callWhisper(url, languageCode) {
  try {
    // Getting the audio form an URL
    const response = await axios.get("YOUR_URL", {
      responseType: "arraybuffer",
    });
    // Making a stream out of the buffer
    const inputStream = arrayBufferToStream(response.data);
    // We want to avoid the 25 MB limitation and ensure that the audio file is within the acceptable size range for the API.
    const resizedBuffer = await reduceBitrate(inputStream);
    //  This step is necessary because the OpenAI API expects a stream as input for the audio file.
    const resizedStream = bufferToReadableStream(resizedBuffer, "audio.mp3");
    const configuration = new Configuration({
      apiKey: process.env.OPEN_API_KEY
    });

    const openai = new OpenAIApi(configuration);
    let prompt = "YOUR PROMPT"

    const resp = await openai.createTranscription(resizedStream, "whisper-1", prompt, "verbose_json", 0.8, language_code, { maxContentLength: Infinity, maxBodyLength: Infinity });
   return resp.data
  } catch (error) {
    console.error(error);
  }
}

callWhisper();


function reduceBitrate(inputStream) {
  return new Promise((resolve, reject) => {
    const outputChunks = [];
    ffmpeg(inputStream)
      .audioBitrate(64) // low quality. You can update that
      .on("error", reject)
      .on("end", () => resolve(Buffer.concat(outputChunks)))
      .format("mp3")
      .pipe(
        new Writable({
          write(chunk, encoding, callback) {
            outputChunks.push(chunk);
            callback();
          },
        })
      );
  });
}

function bufferToReadableStream(buffer, filename) {
  const readable = new Readable({
    read() {
      this.push(buffer);
      this.push(null);
    },
  });
  readable.path = filename;
  return readable;
}
function arrayBufferToStream(buffer) {
  const readable = new Readable({
    read() {
      this.push(Buffer.from(buffer));
      this.push(null);
    },
  });
  return readable;
}


romain130492 avatar Apr 08 '23 14:04 romain130492

When working with FormData, this works for me (using .webm) without storing a file on the server and/or using ffmpeg:

Server.js

const data = Object.fromEntries(await request.formData());
const fileStream = Readable.from(Buffer.from(await (data.audio as Blob).arrayBuffer()));
// @ts-expect-error Workaround till OpenAI fixed the sdk
fileStream.path = 'audio.webm';
const transcription = await openai.createTranscription(
  fileStream as unknown as File,
  'whisper-1'
);

Client.js

let media: Blob[] = [];
let mediaRecorder: MediaRecorder;

const stream = await navigator.mediaDevices.getUserMedia({ audio: true });

mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = async (e) => {
  if (!e.data?.size) return;
  media.push(e.data);
};

async function upload() {
  const formData = new FormData();
  formData.append('audio', new Blob(media, { type: 'audio/webm;codecs=opus' }), 'audio.webm');
  await fetch('/', { method: 'POST', body: formData });
}

I'm using the MediaRecorder in this case, but a simple form would also work.

<form method="POST">
  <input type="file" name="audio" />
  <button type="submit">Upload</button>
</form>

B3nsten avatar Apr 15 '23 01:04 B3nsten

has anyone maybe an example with vercel edge functions?

Manubi avatar Apr 21 '23 12:04 Manubi

Nuxt3 has Cross-platform support for Node.js, Browsers, service-workers and more. Serverless support out of the box. This code works real nice:

import { Configuration, OpenAIApi } from "openai";
import fs from "node:fs";

export default defineEventHandler(async (event) => {

    const config = useRuntimeConfig()

    const configuration = new Configuration({
        apiKey: config.OPENAI_API_KEY,
    });
    const openai = new OpenAIApi(configuration);

    try {

        const resp = await openai.createTranscription(
            // @ts-ignore
            fs.createReadStream("audio.mp3"), // Can use the file event here
            "whisper-1"
        );

        return resp.data
    } catch (error) {
        console.error('server error', error)
    }


})

dosstx avatar Apr 21 '23 15:04 dosstx

As a workaround I am using npm package form-data. And manually sending the request..

Was way easier to figure out rather then how to trick the existing lib into working...

e.g.:

import axios from 'axios';
import * as ffmpeg from 'fluent-ffmpeg';
import * as FormData from 'form-data';
import { Readable, Transform } from 'stream';

...

/**
 * Helper function for downloading a voice message and transcribing it using OpenAI API
 *  - takes in voice message URL(ogg format) e.g.: from Telegram servers
 *  - downloads the voice into memory
 *  - convert it in-memory from ogg to mp3 using fluent-ffmeg
 *  - send the converted mp3 to Open AI API for transcribtion
 * @param voiceUrl .ogg web file
 * @returns transcription | null
 */
private async transcribeVoice(voiceUrl: string): Promise<string | null> {
	const response = await axios.get(voiceUrl, { responseType: 'arraybuffer' });
	const voiceData = response.data;

	const voiceReadable = new Readable();
	voiceReadable.push(voiceData);
	voiceReadable.push(null);

	const convertedAudio = await new Promise<Buffer>((resolve, reject) => {
		const chunks: Buffer[] = [];
		const transformStream = new Transform({
			transform(chunk, encoding, callback) {
				chunks.push(chunk);
				callback();
			},
		});

		ffmpeg(voiceReadable, {})
			.inputFormat('ogg')
			.audioCodec('libmp3lame')
			.format('mp3')
			.on('error', (error) => {
				this._logger.error('Error converting audio:', error);
				reject(error);
			})
			.on('end', () => {
				this._logger.debug('Audio conversion successful: ogg -> ffmpeg -> mp3');
				resolve(Buffer.concat(chunks));
			})
			.pipe(transformStream);
	});

	const formData = new FormData();
	formData.append('file', convertedAudio, { filename: 'voice.mp3', contentType: 'audio/mp3' });
	formData.append('model', 'whisper-1');

	const openaiUrl = 'https://api.openai.com/v1/audio/transcriptions';

	const transcriptionResponse = await axios.post(openaiUrl, formData, {
		headers: {
			...formData.getHeaders(),
			Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
		},
	}).catch(reason => {
		this._logger.error('Error when transcribing VoiceToText via OpenAI API:', reason);
	});

	const transcript = transcriptionResponse?.data?.text;
	if (!transcript) {
		this._logger.error(`Audio transcript failed: '${JSON.stringify(transcriptionResponse?.data)}'`);
		return null;
	}

	this._logger.debug(`Audio transcript successful: '${transcript}'`);
	return transcript;
}

hexxone avatar Apr 30 '23 12:04 hexxone

I made a pull request that would fix this problem #171

Lev-Shapiro avatar Jun 10 '23 15:06 Lev-Shapiro

Hi all, we have an upcoming version v4.0.0 that overhauls file upload support extensively; please give it a try and let us know in the thread whether it suits your needs!

rattrayalex avatar Jun 27 '23 21:06 rattrayalex

Hi all, we have an upcoming version v4.0.0 that overhauls file upload support extensively; please give it a try and let us know in the thread whether it suits your needs!

Thanks dude, the update fixed the issue 👍

huboh avatar Jun 30 '23 18:06 huboh

Great, thank you @huboh ! We encourage others on the thread to give it a try and share feedback.

rattrayalex avatar Jul 08 '23 19:07 rattrayalex

Hey guys, with the new update to the API in November 2023, the solutions above no longer work... I made a post about this in the OpenAI developer forum: https://community.openai.com/t/creating-readstream-from-audio-buffer-for-whisper-api/534380. Does anyone know a solution?

EJKT avatar Dec 03 '23 15:12 EJKT

Workaround at this time

await openai.createTranscription(fs.createReadStream('/path/to/audio.m4a') as any, 'whisper-1');

This doesn't help as it returns an error message saying "Type assertion expressions can only be used in TypeScript files."

ghagevaibhav avatar Apr 16 '24 22:04 ghagevaibhav

For whoever finds this trying to call the API without creating a temporary file, here's how I was able to do it:

import { toFile } from "openai";

async function voiceToText(buffer: NodeJS.ReadableStream) {
    const response = await openai.audio.transcriptions.create({
        model: "whisper-1",
        file: await toFile(buffer, "audio.wav"), // << here
        response_format: "text",
    });

    return response as unknown as string;
}

All credit goes to: https://dev.to/ajones_codes/how-to-get-audio-transcriptions-from-whisper-without-a-file-system-21ek

API Version: ^4.53.2

elumixor avatar Aug 02 '24 10:08 elumixor