drracket icon indicating copy to clipboard operation
drracket copied to clipboard

Problem with diacritics

Open sorawee opened this issue 3 years ago • 7 comments

DrRacket can't display diacritics in Thai language (and probably other languages with diacritics) correctly in the code editor.

Screen Shot 2021-04-04 at 11 08 52 PM

Here's how it should be displayed:


(กำหนด ความกว้าง 500)

(กำหนด ความกว้าง 500)

FWIW, Emacs is able to display it correctly.

Screen Shot 2021-04-04 at 11 13 36 PM

@mbutterick's quad used to have an issue with diacritics too (though it's a different problem), so let me @ you in case you have an idea what could go wrong.

sorawee avatar Apr 04 '21 16:04 sorawee

I am guessing this is an issue with either text% or perhaps the drawing libraries (accessed via, eg, canvas-dc%), but maybe on a non-mac platform? Or maybe a specific font? (It looks okay to me.)

Here's some code that might reproduce the issue outside of DrRacket (if it isn't a font-specific issue).

#lang racket/gui
(define s "กำหนด ความกว้าง")
(define t (new text%))
(define f (new frame% [label ""][width 300] [height 300]))
(define ec (new editor-canvas% [parent f] [editor t]))
(send t insert s)
(send f show #t)

rfindler avatar Apr 04 '21 17:04 rfindler

Sorry, should have mentioned that I'm on Mac. The program that you provided above does reproduce the issue, though weirdly, "กำ" is now displayed correctly! "กว้าง" is still incorrect however.

Screen Shot 2021-04-05 at 6 18 40 AM

This is not a font specific issue IIUC. Even with the font TH Sarabun New (the standard font for Thai script), the issue persists in DrRacket.

Screen Shot 2021-04-05 at 6 21 04 AM

Here's how it displays in word processor softwares.

Screen Shot 2021-04-05 at 6 21 44 AM

sorawee avatar Apr 04 '21 23:04 sorawee

I think the problem is more generally with unicode combining characters:

#lang racket/base

(define chars '(#\e #\u0301))
(displayln chars)
(displayln (list->string chars))
(newline)

(define precomposed-chars
  ((compose string->list string-normalize-nfc list->string)
   chars))
(displayln precomposed-chars)
(displayln (list->string precomposed-chars))

97jaz avatar Apr 05 '21 17:04 97jaz

Related? https://github.com/racket/draw/issues/22 According to a comment in this issue, DrRacket always uses #f for the combine? parameter to the draw-text method of dc<%>. And the code has this comment: https://github.com/racket/draw/blob/a4e156abe5119309783443495d671b9a7f3e434b/draw-lib/racket/draw/private/dc.rkt#L1493

97jaz avatar Apr 05 '21 20:04 97jaz

In the latest version of DrRacket, things are a bit flipped. Running @rfindler's program, we will get:

Screen Shot 2022-01-11 at 5 35 29 PM

where กำ, which consists of two characters and , is displayed without the circle on top of . Note though that กว้าง is now displayed correctly.

It's somewhat weird, because this display problem only occurs when I choose not to "normalize" when pasting the code in. If I normalized, I do get the desired display, but now กำ becomes 3 characters: , , and , which is incorrect in Thai language. is one character, and is not equivalent to + .

Screen Shot 2022-01-11 at 5 40 17 PM

sorawee avatar Jan 12 '22 01:01 sorawee

I want to try this again after the recent unicode change, and just noticed a couple more issues (which already exist even before the unicode change)

Steps to reproduce:

  • Paste (ความกว้าง 500) to DrRacket. Notice that the number 500 is not syntax-highlighted correctly Screen Shot 2022-07-17 at 10 11 58 PM
  • Move the caret to the right parenthesis, and hit the left key multiple times. Somehow it gets stuck at the end of "ความกว้าง"

sorawee avatar Jul 18 '22 05:07 sorawee

The problem with (ความกว้าง 500) should be fixed by the snip-lib commit.

mflatt avatar Jul 18 '22 14:07 mflatt