user-documentation
user-documentation copied to clipboard
Some Unicode (or non-7-bit ASCII) characters cause grief
a. Any paragraph of text or Hack … -delimited example containing an ellipsis (U+2026), left-double quote (U+201C), or right-double quote (U+201D), will be rendered as a blank line.
b. Em-dash (U+2014) and en-dash (U+2013) cause text to be swallowed up with no output.
c. I have cross-references of the form §§, but rather than displaying §§ linked to xxx, the whole construct is swallowed up with no output. BTW, § is U+00A7, so the high bit is set putting it outside the ASCII range.
Are all code points > U+007F handled in this manner?
BTW, I discovered these when pasting text from MS-Word. I've replaced each of these characters with ones that are accepted, but it took me a while to figure out why they "disappeared into the void".
What does locale output on your server?
Does running the following before building help?
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
Here's my locale when I logon:
ubuntu@ip-172-31-36-66:~$ locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
I changed it, as follows:
export LANG=en_US.UTF-8 export LC_ALL=en_US.UTF-8
Here are the locale settings:
ubuntu@ip-172-31-36-66:~/user-documentation/public$ locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8
These changes had no affect; the characters in question (and surrounding text) are still not rendered.
- Can you create a pull request with a complete example? I'm unable to reproduce this in isolation
- Does the result depend on which browser you are using?
Here's my test md file (with suffix .txt added to accommodate the upload constraint):
Non-ASCII Character Tests.md.txt
And here's the captured display for the first few tests:
The Word versions (that is, with text copied straight from MS Word) of each test result in a blank line (except for the section marker, which doesn't render correctly either) on both Chrome and my old IE (I'm running Win8.1).
I did not change any environment variables; just used the default (whose values I reported yesterday).