Configurable Google Speech to Text transcription
Options to enable punctuation and profanity filter to the results obtained when using Google Speech to Text for transcriptions. Added option to enable the reception of interim results instead of only final results when using Google Speech to Text for transcriptions. These options may be set in the sip-communicator.properties: org.jitsi.jigasi.transcription.ENABLE_GOOGLE_AUTOMATIC_PUNCTUATION=false org.jitsi.jigasi.transcription.ENABLE_GOOGLE_PROFANITY_FILTER=false org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=false I left them false by default so that the default behaviour does not change unless the user sets them to true.
If I remember correctly, without interum results, transcriptions can take 2 to 3 seconds to appear (end of sentence). If this is the case, what is the benefit of having a feature flag (org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS) which decreases the UX? What is the use-case for setting org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=False?
Setting org.jitsi.jigasi.transcription.ENABLE_GOOGLE_INTERIM_RESULTS=False is for the current default behaviour (before my change), when you don't receive the result until the sentence is finished, so that is why I left it as the default value. Changing it to true will send the partial sentences while the active speaker is speaking (Vosk had this feature). It can take a couple of seconds if the sentence is long, and we found out that people with hearing disability prefer to read the transcription while they see the person speaking than wait until the sentence has finished, even if the transcription is not 100% accurate and it changes a bit and is updated later. Also people with cognitive disability may find it confusing to see the person speaking and not see the transcription for a while. This way you have the option to have the default functionality, before the change (leaving it false) or have the partial results (changing it to true).
There's existing code to enable interim results, look for RETRIEVE_INTERIM_RESULTS.
Unless I'm missing something, your addition of P_NAME_ENABLE_GOOGLE_INTERIM_RESULTS does one thing only: when you enable the flag, it forces TranscriptionResult.isInterim to false regardless of whether the result is interim or not. My guess is the existing code fails to do something with interim results for you, but I wuld prefer to chase down the issue instead of forcing isInterim=false. Perhaps this is where the problem stems from?
I saw the RETRIEVE_INTERIM_RESULTS constant private final static boolean RETRIEVE_INTERIM_RESULTS in GoogleCloudTranscriptionService.java, which is always set to true, and is used in the config sent by jigasi to the Google API to obtain the transcriptions, but those results are not returned afterwards to the user unless they are final.
What we need are those intermediate results to display them to the user temporarily.
In my last commit I tried to move the ENABLE_GOOGLE_INTERIM_RESULTS parameter to the RemotePublisherTranscriptionHandler class that you told me (the same way I did in GoogleCloudTranscriptionService but renaming the parameter to ENABLE_INTERIM_RESULTS as it no longer belongs to the google transcription only) and changing the line you linked to the following:
if (!enableInterimResults && result.isInterim())
This way it also works fine, when the parameter is set to true, it sends the JSON with the interim results and it may be a better solution. However when setting the option SEND_TEXT to true instead of SEND_JSON in the sip-communicator.properties it has no effect, it does not send the interim results, just the final ones. I'm using the JSON option, but if someone needs to use the TEXT option, and sets the parameter to true it may not work as intended.
I don't know if that is what you were suggesting to change and hope you get a better understanding of what we intend to achieve.
This looks good to me, but I'll let @damencho review.
Codecov Report
Merging #429 (886a387) into master (e05924f) will decrease coverage by
0.08%. The diff coverage is0.00%.
:exclamation: Current head 886a387 differs from pull request most recent head 98be4db. Consider uploading reports for the commit 98be4db to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #429 +/- ##
============================================
- Coverage 23.14% 23.05% -0.09%
Complexity 303 303
============================================
Files 69 69
Lines 5808 5817 +9
Branches 787 788 +1
============================================
- Hits 1344 1341 -3
- Misses 4232 4244 +12
Partials 232 232
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...transcription/GoogleCloudTranscriptionService.java | 0.00% <0.00%> (ø) |
|
| ...scription/RemotePublisherTranscriptionHandler.java | 0.00% <0.00%> (ø) |
|
| ...n/java/org/jitsi/jigasi/JigasiBundleActivator.java | 49.41% <0.00%> (-1.18%) |
:arrow_down: |
| src/main/java/org/jitsi/jigasi/JvbConference.java | 44.88% <0.00%> (-0.64%) |
:arrow_down: |
| .../org/jitsi/jigasi/TranscriptionGatewaySession.java | 0.00% <0.00%> (ø) |
|
| ...c/main/java/org/jitsi/jigasi/stats/Statistics.java | 42.38% <0.00%> (+0.66%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 3fdc420...98be4db. Read the comment docs.
Having automatic punctuation would be really nice. Did something block this PR?