unicodetools icon indicating copy to clipboard operation
unicodetools copied to clipboard

Long lines in new ScriptExtensions.txt cause the space after # to disappear

Open roozbehp opened this issue 11 months ago • 2 comments

Instead of # Po MIDDLE DOT, the first data line in https://github.com/unicode-org/unicodetools/blob/main/unicodetools/data/ucd/dev/ScriptExtensions.txt reads #Po MIDDLE DOT. I see no reason for this space to be dropped if the data is too long, while it's kept for lines with shorter data.

roozbehp avatar Mar 13 '24 01:03 roozbehp

I had noticed this. It is weird, but it is consistent with what we do elsewhere, note in DerivedNormalizationProps.txt

FDF1          ; NFKC_CF; 0642 0644 06D2 # Lo       ARABIC LIGATURE QALA USED AS KORANIC STOP SIGN ISOLATED FORM
FDF2          ; NFKC_CF; 0627 0644 0644 0647 #Lo   ARABIC LIGATURE ALLAH ISOLATED FORM
FDF3          ; NFKC_CF; 0627 0643 0628 0631 #Lo   ARABIC LIGATURE AKBAR ISOLATED FORM
FDF4          ; NFKC_CF; 0645 062D 0645 062F #Lo   ARABIC LIGATURE MOHAMMAD ISOLATED FORM
FDF5          ; NFKC_CF; 0635 0644 0639 0645 #Lo   ARABIC LIGATURE SALAM ISOLATED FORM
FDF6          ; NFKC_CF; 0631 0633 0648 0644 #Lo   ARABIC LIGATURE RASOUL ISOLATED FORM
FDF7          ; NFKC_CF; 0639 0644 064A 0647 #Lo   ARABIC LIGATURE ALAYHE ISOLATED FORM
FDF8          ; NFKC_CF; 0648 0633 0644 0645 #Lo   ARABIC LIGATURE WASALLAM ISOLATED FORM
FDF9          ; NFKC_CF; 0635 0644 0649 # Lo       ARABIC LIGATURE SALLA ISOLATED FORM

and it seems to be intentional, see this comment: https://github.com/unicode-org/unicodetools/blob/6f0c77d0d2b167a67ac54a9083db9a97b2882d82/unicodetools/src/main/java/org/unicode/props/BagFormatter.java#L519-L523

I have no idea what the intention is though. The commit that added that comment is https://github.com/unicode-org/icu/commit/cd418afef7899df376758301889b583ac9b8f849, its message is not particularly illuminating, and neither is ICU-6106. @macchiati, do you remember what you were thinking 16 years ago?

eggrobin avatar Mar 13 '24 01:03 eggrobin

Let's stop doing this

markusicu avatar Apr 09 '24 21:04 markusicu