jigasi
jigasi copied to clipboard
added utf-8 support for SEND JSON POST Request
Description
This PR adds UTF-8 encoding support for SEND JSON POST requests in the transcription module of Jigasi.
org.jitsi.jigasi.transcription.SEND_JSON_REMOTE_URLS=https://ts.meet.jit.si/transcriptions
This ensures proper handling of non-ASCII characters, especially for languages like Hindi, Tamil, Japanese, etc.
Changes:
Explicitly set the Content-Type header to application/json; charset=UTF-8 to indicate that the JSON data is UTF-8 encoded. Modified the byte conversion of the JSON string to use UTF-8 encoding.
Change-1:
conn.setRequestProperty("Content-Type", "application/json");
To:
conn.setRequestProperty("Content-Type", "application/json; charset=UTF-8");
Change-2:
os.write(json.toString().getBytes());
To:
os.write(json.toString().getBytes("UTF-8"));
Motivation:
While the transcriptions worked well in English, issues arose when changing the language to Hindi or others. The received text contained numerous question marks, indicating an encoding issue. By ensuring the data is sent using UTF-8 encoding, this PR aims to resolve such issues and ensure the correct interpretation of non-ASCII characters.
Testing:
Tested the transcription feature with multiple languages, including Hindi, Tamil, and Japanese.
Verified that the JSON POST requests in jigasi sip-communicator.properties are being sent with the correct UTF-8 encoding.
org.jitsi.jigasi.transcription.SEND_JSON_REMOTE_URLS=https://ts.meet.jit.si/transcriptions
Impact:
This change ensures that Jigasi can handle transcription for a wide variety of languages without any encoding-related issues, enhancing its versatility and robustness.
Additional Notes (if any):
Mention any related issues, potential side effects, or further improvements that can be made.