keyboards icon indicating copy to clipboard operation
keyboards copied to clipboard

[sil_bengali_phonetic] questions about Bengali keyboard

Open srl295 opened this issue 2 years ago • 6 comments

Background

Note: I'm converting this keyboard from https://github.com/keymanapp/ldml-keyboards-dev/blob/master/keyboards/sil-bengali/bn-t-k0-cldr-phonetic.ldml to LDML 3 but i think it originally came from sil_bengali_phonetic - there are some variations, but it's pretty similar.

More comments are on https://github.com/unicode-org/cldr/pull/3368

U+09D7

https://github.com/keymanapp/keyboards/blob/9e71c1676392896d92efcd536fab249ddba7fb9f/release/sil/sil_bengali_phonetic/source/sil_bengali_phonetic.kmn#L58

Why is U+09D7 on the keyboard at all? It’s the right half of ৌ but shouldn't be used separately. how is it intended to be used?

09CC;BENGALI VOWEL SIGN AU;Mc;0;L;09C7 09D7;;;;N;;;;;
09D7;BENGALI AU LENGTH MARK;Mc;0;L;;;;;N;;;;;

maybe if there were separate keys for ে + ক + ৗ and that gets transformed into ক + ৌ ? ( = কৌ ) ?

question about A vs AA

From the PR linked,

@miloush asks:

Just to make it clear, the situation is: A -> SIGN AA SHIFT+A -> AU MARK

Q, A -> LETTER A Q, SHIFT+A -> LETTER AA

(notably Q and long sign produce short letter unlike Q in combination with everything else)

srl295 avatar Oct 31 '23 16:10 srl295

As far as I can tell U+09d7 is completely redundant, except perhaps when you want to quote the right half of the mātrā (combining vowel form) and write about it.

gsghyd avatar Oct 28 '24 13:10 gsghyd

@srl295 My email to the author of this keyboard bounced. Do you want the U+09D7 removed from the keyboard, or should I just close this issue?

LornaSIL avatar May 14 '25 20:05 LornaSIL

@LornaSIL totally up to you, I think it's fine to close if there's not time to fix and no response from author.

srl295 avatar May 14 '25 20:05 srl295

If it is not useful, then perhaps it should be removed - or the recombining rule should be added. Otherwise, it is an obvious confusable which has security and consistency implications (but that's part of a MUCH BIGGER discussion, and a keyboard patch is insufficient to fully address the problem -- see UTN #61 for the same discussion on Khmer which took 7 years and is still to achieve full industry acceptance)

On Windows, in Chrome, Notepad and other apps, the two sequences do appear identical:

কৌ কৌ

Image

mcdurdin avatar May 15 '25 03:05 mcdurdin

The only use I can think of would be to quote the right component of the vowel sign au in technical literature. That component will of course never appear without its other half in ordinary texts. Retaining it would of course end up in two different encodings of the same vowel sign. We would have to ensure that search engines treat the two representations as identical. If I were to retain the character, I would ensure that my keyboard always transforms a two-point encoding to a single code point.

On Thu, May 15, 2025 at 9:11 AM Marc Durdin @.***> wrote:

mcdurdin left a comment (keymanapp/keyboards#2446) https://github.com/keymanapp/keyboards/issues/2446#issuecomment-2882147172

If it is not useful, then perhaps it should be removed. Otherwise, it is an obvious confusable which has security and consistency implications (but that's part of a MUCH BIGGER discussion, and a keyboard patch is insufficient to fully address the problem -- see UTN #61 https://www.unicode.org/notes/tn61/ for the same discussion on Khmer which took 7 years and is still to achieve full industry acceptance)

On Windows, in Chrome, Notepad and other apps, the two sequences do appear identical:

কৌ কৌ

image.png (view on web) https://github.com/user-attachments/assets/ce0c57b7-9b11-4b61-99b0-29c70cbe1b9f

— Reply to this email directly, view it on GitHub https://github.com/keymanapp/keyboards/issues/2446#issuecomment-2882147172, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADQ52RIKEDVOQ6HQEXBTYC326QEIFAVCNFSM6AAAAABQXNOOGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQOBSGE2DOMJXGI . You are receiving this because you commented.Message ID: @.***>

gsghyd avatar May 15 '25 06:05 gsghyd

Otherwise, it is an obvious confusable which has security and consistency implications (but that's part of a MUCH BIGGER discussion, and a keyboard patch is insufficient to fully address the problem -- see UTN #61 for the same discussion on Khmer which took 7 years and is still to achieve full industry acceptance)

My bad, this example is covered by Unicode normalization rules. See https://icu4c-demos.unicode.org/icu-bin/nbrowser?t=&s=0995+09c7+09d7&uv=0

So we should still emit NFC (and LDML keyboards will by default), but normalization-aware apps should not have trouble with this.

mcdurdin avatar May 16 '25 15:05 mcdurdin