google-cloud-node
google-cloud-node copied to clipboard
Surprising errors in production code using v2 of @google-cloud/speech
Hey guys, our system today started to produce surprisingly many errors, our PROD server is affected and all our users.
Error on "error" in recognizeStream {"code":3,"details":"Audio data does not appear to be in a supported encoding. If you believe this to be incorrect, try explicitly specifying the decoding parameters.","metadata":{}}
We did not change any implementation and I hope that recent update of Google Chrome also did not touch audio interfaces. We are using MediaRecorder API and up until today all users were happy and get their streams recognized successfully.
Here is our main service:
type StreamingRecognitionConfig =
protos.google.cloud.speech.v2.IStreamingRecognitionConfig;
export const createGoogleService = ({
language,
send,
}: {
language: string;
send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
return new Promise((resolve, reject) => {
try {
const client = new speech.SpeechClient({
keyFilename: 'assistant-demo.json',
});
const recognizer = findRecognizerByLanguageCode(language).name;
const streamingConfig: StreamingRecognitionConfig = {
config: {
autoDecodingConfig: {},
},
streamingFeatures: {
interimResults: false,
enableVoiceActivityEvents: true, // Add this line to enable voice activity events
voiceActivityTimeout: {
speechStartTimeout: { seconds: 60 },
speechEndTimeout: { seconds: 60 },
},
},
};
const configRequest = {
recognizer,
streamingConfig,
};
logger.info('Creating Google service with recogniser:', recognizer);
const recognizeStream = client
._streamingRecognize()
.on('error', error => {
logger.error(
'Error on "error" in recognizeStream',
JSON.stringify(error)
);
send({ type: 'ERROR', data: parseErrorMessage(error) });
})
.on('data', (data: StreamingRecognizeResponse) => {
if (data.results.length > 0) {
const transcription = transformGoogleResponse(data);
if (transcription) {
const transcriptionText = getText(transcription);
if (!transcriptionText?.length) {
// if the transcription is empty, do nothing
return;
}
send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
}
}
})
.on('end', () => {
logger.warn('Google recognizeStream ended');
});
let configSent = false;
let headersSent = false;
const transcribeAudio = (audio: Buffer, headers: Buffer) => {
if (!configSent) {
recognizeStream.write(configRequest);
configSent = true;
return;
}
if (configSent && !headersSent) {
recognizeStream.write({ audio: headers });
headersSent = true;
return;
}
recognizeStream.write({ audio });
};
const stop = () => {
if (recognizeStream) {
recognizeStream.end();
}
};
resolve({ stop, transcribeAudio });
} catch (error) {
logger.error('Error creating Google service:', error);
reject(error);
}
});
};
Hey @sorokinvj, which file types are affected?
Hey @sorokinvj, which file types are affected?
Hey @danielbankhead, we are using real-time transcriptions. Surprisingly until yesterday we were able to use real-time with v2 and with WEBM_OPUS encoding, although I see now that in v2 there is no such thing! only
AUDIO_ENCODING_UNSPECIFIED = 0,
LINEAR16 = 1,
MULAW = 2,
ALAW = 3
Though our setup involved autoDecodingConfig: {}. Do you guys support 'audio/webm;codecs=opus' in v2?
Currently we rolled back to v1 with this code and everything went back to normal:
export const createGoogleService = ({
language,
send,
}: {
language: string;
send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
return new Promise((resolve, reject) => {
try {
const client = new speech.SpeechClient({
keyFilename: 'assistant-demo.json',
});
const recognizeStream = client
.streamingRecognize({
config: {
encoding: 'WEBM_OPUS',
sampleRateHertz: 48000,
languageCode: language,
enableAutomaticPunctuation: true,
enableSpokenPunctuation: {
value: true,
},
},
interimResults: false,
enableVoiceActivityEvents: true,
})
.on('error', error => {
logger.error('Error on "error" in recognizeStream', error);
send({ type: 'ERROR', data: parseErrorMessage(error) });
reject(error);
})
.on('data', (data: StreamingRecognizeResponse) => {
if (data.results.length > 0) {
const transcription = transformGoogleResponse(data);
if (transcription) {
const transcriptionText = getText(transcription);
if (!transcriptionText?.length) {
// if the transcription is empty, do nothing
return;
}
send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
}
}
})
.on('end', () => {
send({
type: 'TRANSCRIPTION_SERVICE_CLOSED',
data: 'TRANSCRIPTION_SERVICE_CLOSED',
});
});
let headersSent = false;
const transcribeAudio = (audio: Buffer, headers: Buffer) => {
if (!headersSent) {
recognizeStream.write(headers);
headersSent = true;
return;
}
recognizeStream.write(audio);
};
const stop = () => {
if (recognizeStream) {
recognizeStream.end();
}
};
resolve({ stop, transcribeAudio });
} catch (error) {
logger.error('Error creating Google service:', error);
reject(error);
}
});
};
on the frontend we are using basic new MediaRecorder api to send the data:
navigator.mediaDevices
.getUserMedia(constraints)
.then((media) => {
// Continue to play the captured audio to the user.
const output = new AudioContext();
const source = output.createMediaStreamSource(media);
source.connect(output.destination);
const audioStream = new MediaStream(media.getAudioTracks());
const silenceDetector = new SilenceDetector(audioStream);
const mediaRecorder = new MediaRecorder(audioStream, {
mimeType: MIME_TYPE,
});
let audioHeaders: BlobEvent;
mediaRecorder.ondataavailable = (event: BlobEvent) => {
if (!audioHeaders) {
audioHeaders = event;
}
const isSilent = silenceDetector?.getIsSilent();
if (!isSilent) {
if (!audioHeaders) {
logger.error('No audio headers found');
return;
}
sendAudioChunk(event, audioHeaders);
}
};
mediaRecorder.start(TIMESLICE_INTERVAL);
@danielbankhead We have the same use case and issue, it is very difficult for us to move to v1. Any update about it? It is considered critical to our system.
WEBM OPUS should be supported:
- https://cloud.google.com/speech-to-text/docs/encoding
- https://github.com/googleapis/google-cloud-node/blob/34e36a6c72a21ef8bb383233d15e2b82dfada8da/packages/google-cloud-speech/protos/google/cloud/speech/v2/cloud_speech.proto#L708
I will see what’s going on.
WEBM OPUS should be supported:
- https://cloud.google.com/speech-to-text/docs/encoding
- https://github.com/googleapis/google-cloud-node/blob/34e36a6c72a21ef8bb383233d15e2b82dfada8da/packages/google-cloud-speech/protos/google/cloud/speech/v2/cloud_speech.proto#L708
I will see what’s going on.
@danielbankhead
Thanks for the quick reply! If you need more info, it seems like it was started around Aug 6 (we started to see tones of these errors in our logs on GCP). I also tested webm files that I am 100% sure worked before (we save a pair of the audio and produced text), and they do not work now when nothing is changed from our side.
Also experiencing this issue, also with WebM and seems to have broken a few days ago.
Also experiencing this issue
Update: the service team is aware of this issue; I should have another update soon.
Any updates?
A fix is rolling out and should be available shortly
@danielbankhead any news on the fix? is it available all ready? do you know the release version I should be looking for?
The issue is on the service side; no update required on the client side. The rollback should be rolled out by now, however I'm waiting for the service team to confirm.
The fix should be widely available now.
I can confirm that the problem is fixed. I sent an audio/webm file to the google-cloud-speech v2 recognize functionality and it worked (didn't told me that file format was invalid)