stata_kernel icon indicating copy to clipboard operation
stata_kernel copied to clipboard

Stata displays #chars per line by bytes, not number of characters

Open kylebarron opened this issue 7 years ago • 3 comments

Problem description

Related to #161, every (?) unicode non-ascii character printed in the console is defined by more than one byte. So with 5 Chinese characters, Stata displays 75 characters, not 80.

image image

This messes up hiding code lines.

image

Possible fixes

I could try to count the number of multi-byte characters that I send to Stata, and modify what string I expect to see returned from Stata

kylebarron avatar Sep 12 '18 13:09 kylebarron

Take the following text:

line mcar car100 car300 trddt, ytitle("CAR(等权)") xtitle("交易日") legend(label

With Stata, the unicode string length is 75, while the unicode display string length is 80. The udstrlen is what Stata must use to determine how many characters to display on a line. image

Apparently CJK characters need two display columns: image

kylebarron avatar Sep 15 '18 16:09 kylebarron

According to Wikipedia, the CJK Unicode block spans U+4E00..U+9FFF.

kylebarron avatar Sep 15 '18 16:09 kylebarron

Python library that measures the width of unicode strings rendered to a terminal: https://github.com/jquast/wcwidth

kylebarron avatar Oct 01 '18 21:10 kylebarron