speech_recognition icon indicating copy to clipboard operation
speech_recognition copied to clipboard

Adds Parameter use_enhanced and model to GoogleCloudSpeech

Open HideyoshiNakazone opened this issue 1 year ago • 5 comments

Adds the parameters use_enhanced and model to the recognize_google_cloud method for more customizable options for the user and better results in specific cases

HideyoshiNakazone avatar Feb 15 '24 19:02 HideyoshiNakazone

Hello @Uberi and @ftnext, i was wondering if it's possible for someone to review my merge request.

Thank you very much, Vitor Hideyoshi.

HideyoshiNakazone avatar Feb 22 '24 00:02 HideyoshiNakazone

Hello @ftnext, is there any interest in this feature? It doesn't break any of GoogleCloudSpeech python api, only extends it. I'm currently already using this implementation in the company i work in, but would love to have this feature merged. If there is anything blocking the merge please tell me :)

HideyoshiNakazone avatar Apr 22 '24 20:04 HideyoshiNakazone

Hi @HideyoshiNakazone!

Looks good overall, but would it be possible to document these parameters in the docs for that function? If so, happy to merge this!

Uberi avatar Apr 26 '24 18:04 Uberi

@Uberi, thanks a lot! I added the parameters to the Docstring of the method Recognizer.recognize_google_cloud and added them to the library reference file. If there is any other places you'd like me to add documentation i'll be happy to :)

HideyoshiNakazone avatar Apr 26 '24 19:04 HideyoshiNakazone

@HideyoshiNakazone Thank you very much for this pull request! I'm very sorry to respond too late. @Uberi Thanks your comment!

In my opinion, it seems to be better to introduce keyword arguments (a.k.a. **kwargs) https://docs.python.org/3/tutorial/controlflow.html#keyword-arguments

Certainly, adding use_enhanced and model as arguments would implement this feature. However, if there are additional arguments to be added in the future, there is a concern that they could be added again (not easy to extend).

I think it would be preferable for Cloud Speech API-specific arguments to be specified as variant keyword arguments.

def recognize_google_cloud(self, audio_data, credentials_json=None, language="en-US", preferred_phrases=None, show_all=False, **api_params):
    """
    If ``preferred_phrases`` is an iterable of phrase strings, ...

    api_params: Cloud Speech API-specific parameters as dict (optional)

        The ``use_enhanced`` is a boolean option ...

        Furthermore, you can use the option ``model`` to set your desired model,

    Returns the most likely transcription if ``show_all`` is False (the default).
    """

    config = {
        'encoding': speech.RecognitionConfig.AudioEncoding.FLAC,
        'sample_rate_hertz': audio_data.sample_rate,
        'language_code': language,
        **api_params,
    }

(It seems that preferred_phrases might be included in api_params too, but this is another issue)

ftnext avatar Apr 29 '24 15:04 ftnext