google-cloud-php icon indicating copy to clipboard operation
google-cloud-php copied to clipboard

Speech API: Cannot use explicit_decoding_config with encoding = ENCODING_UNSPECIFIED

Open jfradj opened this issue 9 months ago • 0 comments

Hello,

I want to use the speech API to convert speech into text.


TL;DR

Using:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
]); 

Throws that error:

Invalid audio channel count value: 0. Values must be non-negative.

While using:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
    'audio_channel_count' => 2,
]); 

Throws that error:

The RecognitionConfig proto is invalid:
  * explicit_decoding_config.audio_channel_count: audio_channel_count isn't supported by the set encoding

Long and detailed version for the courageous ones :)

Environment details

  • OS: MacOS Sonoma 14.3 (23D56)
  • PHP version: PHP 8.2.17
  • Package name and version: google/cloud-speech 1.18.2

Steps to reproduce

I'm working on audio .aac files (generated by Instagram). I tried the online GUI (https://console.cloud.google.com/speech/transcriptions) to try if the .acc file would be supported and it worked => Capture d’écran 2024-05-04 à 07 41 53

When using the GUI, after uploading the file I have a warning Unable to automatically detect audio information. Please review your audio file and enter the relevant fields manually. So I fill fields manually:

  • Encoding = ENCODING_UNSPECIFIED
  • Sample rate = 16000
  • Channel count remains empty

This worked as shown on the screenshot above.

Then I wanted to do the same thing by code using the google/cloud-speech package.

I tried to use the auto_decoding_config option but got the following error:

Audio data does not appear to be in a supported encoding. If you believe this to be incorrect, try explicitly specifying the decoding parameters.

Which is the same behavior as the GUI.

So I tried to use the explicit_decoding_config parameter and it failed. See code below.

Code example

$audioFile = 'https://lookaside.fbsbx.com/ig_messaging_cdn/?asset_id=374095301647771&signature=AbxHJBUywVeA26a-1lSTIeODgXgrAsmxD7pCjaxDo7nNowZZvgE_3fC5jMA3H-9UX7AtT7vdNe3N772RgQpNbgBsvmfp3eT439xW14QykJsqVfvg0aC_GVOJ6sBLBhqDyEzDv7Vt08pCStD0dHvG7PHcL7Gp4RvddKRT_TSYVBQP3PTFPiECX9PsMK528lRG4FaYYIAXN4sBcyeIZsRK6EiiWxo_6g';

$client = new Google\Cloud\Speech\V2\Client\SpeechClient();

$content = file_get_contents($audioFile);

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
]);

$config = new Google\Cloud\Speech\V2\RecognitionConfig([
    'explicit_decoding_config' => $explicitConfig,
    'language_codes' => ['en-EN'],
    'model' => 'latest_long',
]);

$request = new RecognizeRequest([
    'recognizer' => 'projects/{MY_PROJECT_ID}/locations/global/recognizers/_',
    'config' => $config,
    'content' => $content,
]);

$response = $client->recognize($request);
$results = $response->getResults();

foreach ($results as $result) {
    $alternatives = $result->getAlternatives();
    $mostLikely = $alternatives[0];
    $transcript = $mostLikely->getTranscript();
    $confidence = $mostLikely->getConfidence();
    printf('Transcript: %s' . PHP_EOL, $transcript);
    printf('Confidence: %s' . PHP_EOL, $confidence);
}

This code throw the following error:

Invalid audio channel count value: 0. Values must be non-negative.

And setting the audio channel like this:

$explicitConfig = new Google\Cloud\Speech\V2\ExplicitDecodingConfig([
    'encoding' => Google\Cloud\Speech\V1\RecognitionConfig\AudioEncoding::ENCODING_UNSPECIFIED,
    'sample_rate_hertz' => 16000,
    'audio_channel_count' => 2,
]); 

Throw that error:

The RecognitionConfig proto is invalid:
  * explicit_decoding_config.audio_channel_count: audio_channel_count isn't supported by the set encoding

Thanks for your help.

Regards, Johann

jfradj avatar May 04 '24 06:05 jfradj