Relegating Unicode characters in examples to the Unicode section
[Originally requested by mistake under raku repo.]
Problem or new feature
Raku’s ability to use Unicode throughout the language is impressive - very impressive, but many examples in the documentation rely on characters that cannot be typed on most (or in some cases - all) standard keyboards. This makes those examples difficult to reproduce and makes learning Rakudo more difficult. Examples in which I either need to know the Unicode names, the Unicode codepoints, abandon that idea and copy/paste these characters, or lastly, learning to translate what the author wrote (for the sake of novelty) into what I actually will be typing, are unhelpful.
In many places, the primary purpose of these symbols appears to be demonstrating that Raku can interpret arbitrary Unicode in code. It's cool, sure, but it is rarely (if ever) relevant to users trying to understand a language construct.
FWIW, this was one of the reasons I steered away from Rakudo. I had to understand what each character in each example actually meant. ("Is it 《 or is it ⟪ or is it «, and do I use "" or '' or << or (( to replicate it? Why is it part of a say statement?") It was frustrating that every example was trying to show off something unrelated to the topic at hand, and I couldn't see myself writing it.
Partially driving the point home and partially letting out some more steam: if Raku supported syntax written in non-printable control characters or even audio input, those would be fascinating capabilities worth documenting. But they would not belong in everyday examples meant to teach loops, classes, or basic operators.
Suggestions
To improve accessibility and clarity, I suggest:
- Removing Unicode or novelty characters from examples unless they have a relevant value.
- Replacing such examples in the rest of the documentation with ASCII-friendly versions that users can type directly or copy/paste cleanly, so users focus on what's being explained.
- Making sure the Unicode section encompasses all relevant examples that showcase this ability.
That would let Rakudo's main documentation be practical and approachable, and I could stop complaining about it. :)
In the Rakudo version of the issue it was suggested to have a toggle that would allow one to switch between ASCII and Unicode version of identifiers.
I think this could be implemented in a JS frontend thing with a conversion map, assuming the generated docs would only contain ASCII versions?
By delaying this until the actual rendering, we wouldn't have to worry about:
- adding some logic in rakudoc for handling this
- fixing the renderer to handle that additional logic
Pinging @finanalyst @thoughtstream about this idea.
I have a few observations. TLDR; I agree.
Firstly
I have been on this journey and ended up at the same place. In the Physics::Measure README.md, I recently wrote:
Note: The caret prefix ^ can now used as an alternative to the libra prefix ♎️ to ease typing. Also the tilde ~ has been added as an alternative to ± to introduce an Error term.
Secondly
Generally, I am against gadgets on our websites. UX is a very hard thing to do well (especially on the web) and we will already have globals controls for Light/Dark and for Lang [EN, FR] (and actually separate ones for interface lang and content lang are currently envisioned... hmmm). It is a very common notion to reach for a light/dark, but it's not a natural UX thing to reach for a dropdown to toggle unicode/non-unicode.
Thirdly
Really, for core documentation principles, I think we need to decide them and then stick to them. An additional negative of a gadget is that it allows us to sit on the fence, where really we need to show the language in its best, simplest light.
Summary
So, I hope that we can achieve strong consensus on this proposal and then (if that's the way it rolls) have a simple mechanism (eg tooltips?) for those who want to opt in to unicode to see that alternative in context - and a unicode glossary addendum to that section of the docs.
PS. I realize that this may be a feature on the new raku.org site, for example:
method area { π * $.radius² }
Please let me know if you think we should fix those instances!
I'd like to echo practically everything @librasteve says. To highlight specifics:
UX is a very hard thing to do well (especially on the web)
No need to add complexity unless it adds value.
Really, for core documentation principles, I think we need to decide them and then stick to them.
Simplicity, consistency.
[...] we need to show the language in its best, simplest light.
💯 %
[...] for those who want to opt in to unicode to see that alternative in context [...]
I think that if this Unicode character as syntax capability was critical to where it's used, it adds benefit. And if doesn't add specific benefit where it's displayed, there's no point in having it there. E.g., if I'm reading about loop constructs, I probably don't care about π, but if I'm reading about how Raku can leverage π (the character and the value) to help with mathematical/scientific calculations, that's really good to see (and quite impressive).
The example @librasteve provides of:
method area { π * $.radius² }
...is a good example of when the Unicode character really adds value, and where a tooltip version (one showing the Unicode, one not - no opinion on which should be the tooltip, except that it stays consistent) can be very helpful.
I would even just put it in a commented line:
method area { π * $.radius² }
# Alternatively: method area { ... }
(I would've written the content, but I have no idea how to express these in ASCII.)
FWIW, I've been fairly happy cutting-and-pasting from the Raku docs into iTerm2, MacVim, etc.
Set Functions just look way better in Unicode than ASCII.
~$ raku -e 'my $fruits = set <peach apple orange apple apple>; \
say "apple" ∈ $fruits;'
True
Also, if you're interpolating strings or playing with regexen then Unicode names are your friends. Time to learn them!
~$ raku -e 'my $fruits = set <peach apple orange apple apple>;
say "\"apple\" \c[ELEMENT OF] ($fruits)";'
"apple" ∈ (orange peach apple)
TL;DR
It's a big feature of the language and it works really well so it should be front-and-center. So I disagree with @xsawyerx .
@doomvox
It's a big feature of the language and it works really well so it should be front-and-center.
If the goal is to support people who are still learning the language, I think using Unicode sparingly while they are getting familiar with the core language constructs makes more sense.
I do not have empirical data to cite, but I have heard this concern raised fairly often. I am simply sharing this recurring piece of feedback I have encountered, and one I also share, when learning Raku.
If you find the Unicode characters in the syntax (when learning about non-Unicode related stuff) is helpful for beginners learning language constructs, there's no reason to change it.
FWIW, though, I think the "apple" ∈ $fruits is a good example of when Unicode is relevant, just like @librasteve's example of the area subroutine. The goal wasn't to remove all cases, but only irrelevant ones.
actually "apple" ∈ $fruits is one of the cases I was going to raise - because, as you say, it is a big improvement on some kind of loop and test and very easy to read and gather the intent --- but for the Set ops the ascii versions are less easy to read and since we write once, read many that is a cost worth paying in this case