[sil_bengali_phonetic] questions about Bengali keyboard
Background
Note: I'm converting this keyboard from https://github.com/keymanapp/ldml-keyboards-dev/blob/master/keyboards/sil-bengali/bn-t-k0-cldr-phonetic.ldml to LDML 3 but i think it originally came from sil_bengali_phonetic - there are some variations, but it's pretty similar.
More comments are on https://github.com/unicode-org/cldr/pull/3368
U+09D7
https://github.com/keymanapp/keyboards/blob/9e71c1676392896d92efcd536fab249ddba7fb9f/release/sil/sil_bengali_phonetic/source/sil_bengali_phonetic.kmn#L58
Why is U+09D7 on the keyboard at all? It’s the right half of ৌ but shouldn't be used separately. how is it intended to be used?
09CC;BENGALI VOWEL SIGN AU;Mc;0;L;09C7 09D7;;;;N;;;;;
09D7;BENGALI AU LENGTH MARK;Mc;0;L;;;;;N;;;;;
maybe if there were separate keys for ে + ক + ৗ and that gets transformed into ক + ৌ ? ( = কৌ ) ?
question about A vs AA
From the PR linked,
@miloush asks:
Just to make it clear, the situation is: A -> SIGN AA SHIFT+A -> AU MARK
Q, A -> LETTER A Q, SHIFT+A -> LETTER AA
(notably Q and long sign produce short letter unlike Q in combination with everything else)
As far as I can tell U+09d7 is completely redundant, except perhaps when you want to quote the right half of the mātrā (combining vowel form) and write about it.
@srl295 My email to the author of this keyboard bounced. Do you want the U+09D7 removed from the keyboard, or should I just close this issue?
@LornaSIL totally up to you, I think it's fine to close if there's not time to fix and no response from author.
If it is not useful, then perhaps it should be removed - or the recombining rule should be added. Otherwise, it is an obvious confusable which has security and consistency implications (but that's part of a MUCH BIGGER discussion, and a keyboard patch is insufficient to fully address the problem -- see UTN #61 for the same discussion on Khmer which took 7 years and is still to achieve full industry acceptance)
On Windows, in Chrome, Notepad and other apps, the two sequences do appear identical:
কৌ কৌ
The only use I can think of would be to quote the right component of the vowel sign au in technical literature. That component will of course never appear without its other half in ordinary texts. Retaining it would of course end up in two different encodings of the same vowel sign. We would have to ensure that search engines treat the two representations as identical. If I were to retain the character, I would ensure that my keyboard always transforms a two-point encoding to a single code point.
On Thu, May 15, 2025 at 9:11 AM Marc Durdin @.***> wrote:
mcdurdin left a comment (keymanapp/keyboards#2446) https://github.com/keymanapp/keyboards/issues/2446#issuecomment-2882147172
If it is not useful, then perhaps it should be removed. Otherwise, it is an obvious confusable which has security and consistency implications (but that's part of a MUCH BIGGER discussion, and a keyboard patch is insufficient to fully address the problem -- see UTN #61 https://www.unicode.org/notes/tn61/ for the same discussion on Khmer which took 7 years and is still to achieve full industry acceptance)
On Windows, in Chrome, Notepad and other apps, the two sequences do appear identical:
কৌ কৌ
image.png (view on web) https://github.com/user-attachments/assets/ce0c57b7-9b11-4b61-99b0-29c70cbe1b9f
— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2446#issuecomment-2882147172, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ52RIKEDVOQ6HQEXBTYC326QEIFAVCNFSM6AAAAABQXNOOGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQOBSGE2DOMJXGI . You are receiving this because you commented.Message ID: @.***>
Otherwise, it is an obvious confusable which has security and consistency implications (but that's part of a MUCH BIGGER discussion, and a keyboard patch is insufficient to fully address the problem -- see UTN #61 for the same discussion on Khmer which took 7 years and is still to achieve full industry acceptance)
My bad, this example is covered by Unicode normalization rules. See https://icu4c-demos.unicode.org/icu-bin/nbrowser?t=&s=0995+09c7+09d7&uv=0
So we should still emit NFC (and LDML keyboards will by default), but normalization-aware apps should not have trouble with this.