Speech-to-Text v2 Inline Phrase Sets with Recognizer

Open vamaral1 opened this issue 2 months ago • 0 comments

Determine this is the right repository

[x] I determined this is the correct repository in which to report this bug.

Summary of the issue

Bug Report: Speech-to-Text v2 Inline Phrase Sets with Recognizer

Summary

Error: google.api_core.exceptions.NotFound: 404 Requested entity was not found. while calling speech_client.recognize(request=request)
Context: Error using inline phrase sets for speech adaptation with a dedicated recognizer.
Reference: https://cloud.google.com/speech-to-text/docs/adaptation-model
Observation: Commenting out adaptation from RecognitionConfig makes transcription work, indicating the issue is with adaptation construction

What I've Verified

1. Structure is Correct ✅

Inline phrase set is constructed correctly
oneof constraint is respected (phrase_set vs inline_phrase_set)
Phrase values and boosts are set correctly
Recognizer path is correct
Regional endpoint is correct: us-speech.googleapis.com

2. What I've Tried ❌

✅ Fixed oneof violations: Ensured only inline_phrase_set is set (not phrase_set)
✅ Removed referenced phrase sets: Tested with only inline phrase sets
✅ Verified protobuf structure: Used _pb.WhichOneof("value") to confirm active field
✅ Added/removed model and language_codes: Tried both with and without explicit values
✅ Tested project ID consistency

3. Current Code State

✅ Creates inline phrase sets from player names (privacy-compliant)
✅ No referenced phrase sets
✅ Correct oneof usage
✅ Recognizer path and endpoint are correct
❌ Still getting 404 error

Key Findings

From Documentation Review

✅ Documentation states inline phrase sets are supported
⚠️ No explicit restriction mentioned for recognizers
⚠️ Some community reports of inconsistent behavior with inline phrase sets in v2

From Source Code Analysis

AdaptationPhraseSet has a oneof field: phrase_set (string) vs inline_phrase_set (PhraseSet message)
Setting one should automatically clear the other (proto-plus-python behavior)
PhraseSet.name field doesn't have presence, so can't be explicitly unset

Hypothesis

Most Likely: Inline phrase sets may not actually be supported when using a recognizer in Speech-to-Text v2, despite documentation suggesting otherwise. This could be:

An API limitation/bug
A validation issue where the API tries to resolve something that doesn't exist
A conflict between recognizer defaults and inline adaptation

Current Status

✅ Code Structure: Correct
❌ Error: Still persists (404 when adaptation is used)
❌ Workaround: None found yet
Consider alternatives:
- Use referenced phrase sets (privacy concern: requires storing names)
- Don't use a recognizer (loses recognizer benefits)

Conclusion

The code is correct according to the API structure, but the API still returns 404, suggesting a potential limitation or bug in the API itself.

API client name and version

google-cloud-speech 2.33.0

Reproduction steps: code

Code Snippet

import os
import urllib.request
from google.cloud.speech_v2 import SpeechClient
from google.cloud.speech_v2.types import cloud_speech

names = ["foo", "bar"]
project_id = os.getenv("GCLOUD_PROJECT")
recognizer_id = os.getenv("RECOGNIZER_ID")
audio_url = os.getenv("AUDIO_URL")
# From emulator storage http://127.0.0.1:9199/v0/b/{project_id}.firebasestorage.app/o/{bucket}...
with urllib.request.urlopen(audio_url) as response:
    audio_data = response.read()

transport = SpeechClient.get_transport_class("grpc")(
    host="us-speech.googleapis.com"
)
speech_client = SpeechClient(transport=transport)

phrase_set = cloud_speech.PhraseSet(
    phrases=[{"value": name, "boost": 10} for name in names],
)
adaptation = cloud_speech.SpeechAdaptation(
    phrase_sets=[
        cloud_speech.SpeechAdaptation.AdaptationPhraseSet(
            inline_phrase_set=phrase_set
        )
    ],
)

config = cloud_speech.RecognitionConfig(
    auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
    adaptation=adaptation,
    language_codes=["en-US"],
    model="chirp_3",
    features=cloud_speech.RecognitionFeatures(
        enable_automatic_punctuation=True,
        diarization_config=cloud_speech.SpeakerDiarizationConfig(
            min_speaker_count=1,
            max_speaker_count=2,
        ),
    ),
)

request = cloud_speech.RecognizeRequest(
    recognizer=f"projects/{project_id}/locations/us/recognizers/{recognizer_id}",
    config=config,
    content=audio_data,
)

response = speech_client.recognize(request=request)
print(response)

Reproduction steps: supporting files

No response

Reproduction steps: actual results

Stack Trace

Traceback (most recent call last):
  File "/app/functions/venv/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/functions/venv/lib/python3.11/site-packages/grpc/_interceptor.py", line 277, in __call__
    response, ignored_call = self._with_call(
                             ^^^^^^^^^^^^^^^^
  File "/app/functions/venv/lib/python3.11/site-packages/grpc/_interceptor.py", line 332, in _with_call
    return call.result(), call
           ^^^^^^^^^^^^^
  File "/app/functions/venv/lib/python3.11/site-packages/grpc/_channel.py", line 440, in result
    raise self
  File "/app/functions/venv/lib/python3.11/site-packages/grpc/_interceptor.py", line 315, in continuation
    response, call = self._thunk(new_method).with_call(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/functions/venv/lib/python3.11/site-packages/grpc/_channel.py", line 1192, in with_call
    return _end_unary_response_blocking(state, call, True, None)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/functions/venv/lib/python3.11/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.NOT_FOUND
	details = "Requested entity was not found."
	debug_error_string = "UNKNOWN:Error received from peer ipv4:[GOOGLE_API_IP]:443 {grpc_status:5, grpc_message:"Requested entity was not found."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/functions/test_speech_recognition.py", line 60, in <module>
    response = speech_client.recognize(request=request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/functions/venv/lib/python3.11/site-packages/google/cloud/speech_v2/services/speech/client.py", line 1790, in recognize
    response = rpc(
               ^^^^
  File "/app/functions/venv/lib/python3.11/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/functions/venv/lib/python3.11/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.NotFound: 404 Requested entity was not found.

Reproduction steps: expected results

results {
  alternatives {
    transcript: "..."
  }
}

OS & version + platform

OS: Linux Version: 6.10.14-linuxkit Platform: Containerized Linux (linuxkit)

Python environment

Python 3.11.0rc1

Python dependencies

No response

Additional context

No response

Nov 14 '25 18:11 vamaral1