jigasi icon indicating copy to clipboard operation
jigasi copied to clipboard

Configurable Google Speech to Text transcription

Open kratu92 opened this issue 3 years ago • 7 comments

Options to enable punctuation and profanity filter to the results obtained when using Google Speech to Text for transcriptions. Added option to enable the reception of interim results instead of only final results when using Google Speech to Text for transcriptions. These options may be set in the sip-communicator.properties: org.jitsi.jigasi.transcription.ENABLE_GOOGLE_AUTOMATIC_PUNCTUATION=false org.jitsi.jigasi.transcription.ENABLE_GOOGLE_PROFANITY_FILTER=false org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=false I left them false by default so that the default behaviour does not change unless the user sets them to true.

kratu92 avatar Jun 10 '22 12:06 kratu92

If I remember correctly, without interum results, transcriptions can take 2 to 3 seconds to appear (end of sentence). If this is the case, what is the benefit of having a feature flag (org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS) which decreases the UX? What is the use-case for setting org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=False?

nikvaessen avatar Jun 13 '22 06:06 nikvaessen

Setting org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=False is for the current default behaviour (before my change), when you don't receive the result until the sentence is finished, so that is why I left it as the default value. Changing it to true will send the partial sentences while the active speaker is speaking (Vosk had this feature). It can take a couple of seconds if the sentence is long, and we found out that people with hearing disability prefer to read the transcription while they see the person speaking than wait until the sentence has finished, even if the transcription is not 100% accurate and it changes a bit and is updated later. Also people with cognitive disability may find it confusing to see the person speaking and not see the transcription for a while. This way you have the option to have the default functionality, before the change (leaving it false) or have the partial results (changing it to true).

kratu92 avatar Jun 13 '22 06:06 kratu92

There's existing code to enable interim results, look for RETRIEVE_INTERIM_RESULTS.

Unless I'm missing something, your addition of P_NAME_ENABLE_GOOGLE_INTERIM_RESULTS does one thing only: when you enable the flag, it forces TranscriptionResult.isInterim to false regardless of whether the result is interim or not. My guess is the existing code fails to do something with interim results for you, but I wuld prefer to chase down the issue instead of forcing isInterim=false. Perhaps this is where the problem stems from?

bgrozev avatar Jun 13 '22 14:06 bgrozev

I saw the RETRIEVE_INTERIM_RESULTS constant private final static boolean RETRIEVE_INTERIM_RESULTS in GoogleCloudTranscriptionService.java, which is always set to true, and is used in the config sent by jigasi to the Google API to obtain the transcriptions, but those results are not returned afterwards to the user unless they are final. What we need are those intermediate results to display them to the user temporarily. In my last commit I tried to move the ENABLE_GOOGLE_INTERIM_RESULTS parameter to the RemotePublisherTranscriptionHandler class that you told me (the same way I did in GoogleCloudTranscriptionService but renaming the parameter to ENABLE_INTERIM_RESULTS as it no longer belongs to the google transcription only) and changing the line you linked to the following: if (!enableInterimResults && result.isInterim()) This way it also works fine, when the parameter is set to true, it sends the JSON with the interim results and it may be a better solution. However when setting the option SEND_TEXT to true instead of SEND_JSON in the sip-communicator.properties it has no effect, it does not send the interim results, just the final ones. I'm using the JSON option, but if someone needs to use the TEXT option, and sets the parameter to true it may not work as intended. I don't know if that is what you were suggesting to change and hope you get a better understanding of what we intend to achieve.

kratu92 avatar Jun 14 '22 14:06 kratu92

This looks good to me, but I'll let @damencho review.

bgrozev avatar Jun 14 '22 16:06 bgrozev

Codecov Report

Merging #429 (886a387) into master (e05924f) will decrease coverage by 0.08%. The diff coverage is 0.00%.

:exclamation: Current head 886a387 differs from pull request most recent head 98be4db. Consider uploading reports for the commit 98be4db to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #429      +/-   ##
============================================
- Coverage     23.14%   23.05%   -0.09%     
  Complexity      303      303              
============================================
  Files            69       69              
  Lines          5808     5817       +9     
  Branches        787      788       +1     
============================================
- Hits           1344     1341       -3     
- Misses         4232     4244      +12     
  Partials        232      232              
Impacted Files Coverage Δ
...transcription/GoogleCloudTranscriptionService.java 0.00% <0.00%> (ø)
...scription/RemotePublisherTranscriptionHandler.java 0.00% <0.00%> (ø)
...n/java/org/jitsi/jigasi/JigasiBundleActivator.java 49.41% <0.00%> (-1.18%) :arrow_down:
src/main/java/org/jitsi/jigasi/JvbConference.java 44.88% <0.00%> (-0.64%) :arrow_down:
.../org/jitsi/jigasi/TranscriptionGatewaySession.java 0.00% <0.00%> (ø)
...c/main/java/org/jitsi/jigasi/stats/Statistics.java 42.38% <0.00%> (+0.66%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3fdc420...98be4db. Read the comment docs.

codecov[bot] avatar Jun 14 '22 16:06 codecov[bot]

Having automatic punctuation would be really nice. Did something block this PR?

dnna avatar Nov 08 '23 23:11 dnna