google-cloud-node Surprising errors in production code using v2 of @google-cloud/speech

Hey guys, our system today started to produce surprisingly many errors, our PROD server is affected and all our users.

Error on "error" in recognizeStream {"code":3,"details":"Audio data does not appear to be in a supported encoding. If you believe this to be incorrect, try explicitly specifying the decoding parameters.","metadata":{}}

We did not change any implementation and I hope that recent update of Google Chrome also did not touch audio interfaces. We are using MediaRecorder API and up until today all users were happy and get their streams recognized successfully.

Here is our main service:

type StreamingRecognitionConfig =
  protos.google.cloud.speech.v2.IStreamingRecognitionConfig;

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizer = findRecognizerByLanguageCode(language).name;

      const streamingConfig: StreamingRecognitionConfig = {
        config: {
          autoDecodingConfig: {},
        },
        streamingFeatures: {
          interimResults: false,
          enableVoiceActivityEvents: true, // Add this line to enable voice activity events
          voiceActivityTimeout: {
            speechStartTimeout: { seconds: 60 },
            speechEndTimeout: { seconds: 60 },
          },
        },
      };
      const configRequest = {
        recognizer,
        streamingConfig,
      };

      logger.info('Creating Google service with recogniser:', recognizer);

      const recognizeStream = client
        ._streamingRecognize()
        .on('error', error => {
          logger.error(
            'Error on "error" in recognizeStream',
            JSON.stringify(error)
          );
          send({ type: 'ERROR', data: parseErrorMessage(error) });
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          logger.warn('Google recognizeStream ended');
        });

      let configSent = false;
      let headersSent = false;
      const transcribeAudio = (audio: Buffer, headers: Buffer) => {
        if (!configSent) {
          recognizeStream.write(configRequest);
          configSent = true;
          return;
        }
        if (configSent && !headersSent) {
          recognizeStream.write({ audio: headers });
          headersSent = true;
          return;
        }
        recognizeStream.write({ audio });
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };
      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};

Aug 09 '24 20:08 sorokinvj

Hey @sorokinvj, which file types are affected?

Aug 09 '24 20:08 danielbankhead

Hey @sorokinvj, which file types are affected?

Hey @danielbankhead, we are using real-time transcriptions. Surprisingly until yesterday we were able to use real-time with v2 and with WEBM_OPUS encoding, although I see now that in v2 there is no such thing! only

AUDIO_ENCODING_UNSPECIFIED = 0,
LINEAR16 = 1,
MULAW = 2,
ALAW = 3

Though our setup involved autoDecodingConfig: {}. Do you guys support 'audio/webm;codecs=opus' in v2?

Currently we rolled back to v1 with this code and everything went back to normal:

export const createGoogleService = ({
  language,
  send,
}: {
  language: string;
  send: Sender<MachineEvent>;
}): Promise<TranscriptionService> => {
  return new Promise((resolve, reject) => {
    try {
      const client = new speech.SpeechClient({
        keyFilename: 'assistant-demo.json',
      });

      const recognizeStream = client
        .streamingRecognize({
          config: {
            encoding: 'WEBM_OPUS',
            sampleRateHertz: 48000,
            languageCode: language,
            enableAutomaticPunctuation: true,
            enableSpokenPunctuation: {
              value: true,
            },
          },
          interimResults: false,
          enableVoiceActivityEvents: true,
        })
        .on('error', error => {
          logger.error('Error on "error" in recognizeStream', error);
          send({ type: 'ERROR', data: parseErrorMessage(error) });
          reject(error);
        })
        .on('data', (data: StreamingRecognizeResponse) => {
          if (data.results.length > 0) {
            const transcription = transformGoogleResponse(data);
            if (transcription) {
              const transcriptionText = getText(transcription);
              if (!transcriptionText?.length) {
                // if the transcription is empty, do nothing
                return;
              }
              send({ type: 'NEW_TRANSCRIPTION', data: transcriptionText });
            }
          }
        })
        .on('end', () => {
          send({
            type: 'TRANSCRIPTION_SERVICE_CLOSED',
            data: 'TRANSCRIPTION_SERVICE_CLOSED',
          });
        });

      let headersSent = false;

      const transcribeAudio = (audio: Buffer, headers: Buffer) => {
        if (!headersSent) {
          recognizeStream.write(headers);
          headersSent = true;
          return;
        }
        recognizeStream.write(audio);
      };

      const stop = () => {
        if (recognizeStream) {
          recognizeStream.end();
        }
      };

      resolve({ stop, transcribeAudio });
    } catch (error) {
      logger.error('Error creating Google service:', error);
      reject(error);
    }
  });
};

on the frontend we are using basic new MediaRecorder api to send the data:

    navigator.mediaDevices
      .getUserMedia(constraints)
      .then((media) => {
        // Continue to play the captured audio to the user.
        const output = new AudioContext();
        const source = output.createMediaStreamSource(media);
        source.connect(output.destination);

        const audioStream = new MediaStream(media.getAudioTracks());
        const silenceDetector = new SilenceDetector(audioStream);
        const mediaRecorder = new MediaRecorder(audioStream, {
          mimeType: MIME_TYPE,
        });

        let audioHeaders: BlobEvent;
        mediaRecorder.ondataavailable = (event: BlobEvent) => {
          if (!audioHeaders) {
            audioHeaders = event;
          }

          const isSilent = silenceDetector?.getIsSilent();
          if (!isSilent) {
            if (!audioHeaders) {
              logger.error('No audio headers found');
              return;
            }
            sendAudioChunk(event, audioHeaders);
          }
        };

        mediaRecorder.start(TIMESLICE_INTERVAL);

Aug 10 '24 13:08 sorokinvj

@danielbankhead We have the same use case and issue, it is very difficult for us to move to v1. Any update about it? It is considered critical to our system.

Aug 12 '24 11:08 meitarbe

WEBM OPUS should be supported:

https://cloud.google.com/speech-to-text/docs/encoding
https://github.com/googleapis/google-cloud-node/blob/34e36a6c72a21ef8bb383233d15e2b82dfada8da/packages/google-cloud-speech/protos/google/cloud/speech/v2/cloud_speech.proto#L708

I will see what’s going on.

Aug 12 '24 14:08 danielbankhead

WEBM OPUS should be supported:

https://cloud.google.com/speech-to-text/docs/encoding

https://github.com/googleapis/google-cloud-node/blob/34e36a6c72a21ef8bb383233d15e2b82dfada8da/packages/google-cloud-speech/protos/google/cloud/speech/v2/cloud_speech.proto#L708

I will see what’s going on.

@danielbankhead
Thanks for the quick reply! If you need more info, it seems like it was started around Aug 6 (we started to see tones of these errors in our logs on GCP). I also tested webm files that I am 100% sure worked before (we save a pair of the audio and produced text), and they do not work now when nothing is changed from our side.

Aug 12 '24 15:08 meitarbe

Also experiencing this issue, also with WebM and seems to have broken a few days ago.

Aug 12 '24 19:08 paullombardcartello

Also experiencing this issue

Aug 13 '24 08:08 asafda

Update: the service team is aware of this issue; I should have another update soon.

Aug 13 '24 22:08 danielbankhead

Any updates?

Aug 21 '24 10:08 paullombardcartello

A fix is rolling out and should be available shortly

Aug 21 '24 21:08 danielbankhead

@danielbankhead any news on the fix? is it available all ready? do you know the release version I should be looking for?

Aug 28 '24 10:08 sorokinvj

The issue is on the service side; no update required on the client side. The rollback should be rolled out by now, however I'm waiting for the service team to confirm.

Aug 28 '24 18:08 danielbankhead

The fix should be widely available now.

Aug 30 '24 16:08 danielbankhead

I can confirm that the problem is fixed. I sent an audio/webm file to the google-cloud-speech v2 recognize functionality and it worked (didn't told me that file format was invalid)

Sep 03 '24 15:09 felabrecque

google-cloud-node google-cloud-node copied to clipboard

Surprising errors in production code using v2 of @google-cloud/speech

google-cloud-node
google-cloud-node copied to clipboard