opentype-shaping-documents icon indicating copy to clipboard operation
opentype-shaping-documents copied to clipboard

[Myanmar] Syllable matching and punctuation

Open wezm opened this issue 2 weeks ago • 6 comments

I'm working on Myanmar shaping in Allsorts and have a query about how punctuation should be handled in syllable splitting. There are these punctuation characters in the Myanmar character tables but they don't seem to be matched by any rules.

Codepoint Unicode category Shaping class Mark-placement subclass Glyph
U+104A Punctuation null null ၊ Little Section
U+104B Punctuation null null ။ Section
U+104C Punctuation null null ၌ Locative
U+104D Punctuation null null ၍ Completed
U+104F Punctuation null null ၏ Genitive

I've run my implementation against this text "ပို၍စောစီးစွာပေးပါက" and ၍ is tripping it up. It has no shaping class/rules that match it in the syllable identification details.

There are these two notes though:

Assigned codepoints with a null in the Shaping class column evoke no special behavior from the shaping engine.

and

A sequence that does not match any of these expressions should be regarded as broken. The shaping engine may make a best-effort attempt to shape the broken sequence, but making guarantees about the correctness or appearance of the final result is out of scope for this document.

I'm wondering how these characters should be handled, since their use doesn't feel like a broken expression?

One other note: ။ and ၊ are referenced in the non-terminal _punc_ = "Little Section" | "Section" however punc` does not appear to be used, wondering if that's intended?

Edit: I see the following on the OpenType Myanmar page:

Simple non-compounding cluster

<P | S | R | WJ| WS | O | D0 >

Punctuation (P), symbols (S), reserved characters from the Myanmar block (R), word joiner (WJ), white space (WS), and other SCRIPT_COMMON charcters (O) contain one character per cluster.

Which suggests ၍ and friends should be accepted as cluster by themselves.

wezm avatar Jun 21 '24 05:06 wezm