doc: Update pygments syntax highlight for numbers
This issue is not caused by the project but is related to it. We need to update the regular expressions used by the Pygments library to take into account the following updates to Opendylan:
- DEP#11 which allows the use of underscore characters between any two digits.
- Double float numbers (not considered until now).
The solution could be discussed here until a PR is proposed to the Pygments project.
Current regexp:
https://github.com/pygments/pygments/blob/edef94d66c2d70f05a86ac6098a69ab253b8d946/pygments/lexers/dylan.py#L140
The expressions below are case insensitive.
Include _ in binary numbers:
#b[01]+(?:_[01]+)*
Matches are shown highlighted
Or include white space or carriage return at the end
#b[01]+(?:_[01]+)*(\s+|\r)
#b[01]+(?:_[01]+)*$
Include _ in octal numbers (similar to binary):
#o[0-7]+(?:_[0-7]+)*
or
#o[0-7]+(?:_[0-7]+)*(\s+|\r)
Hexadecimal numbers test cases:
#xff
#xdead_beef
#xdead_beef_
#xb_e_e_f
#xbe__ef
#x_beef_
#xh
#x + 1
#xff,#xff, #xff
(#xff)
#x[0-9a-f]+(?:_[0-9a-f]+)*$
or
#x[0-9a-f]+(?:_[0-9a-f]+)*($|[^0-9a-f_])
Nice. I'm not sure how Pygments invokes the regex, but just in case:
- I assume it uses a case-insensitive test? It needs to work for
#x,#X, and[A-F]. - If it gives you more text than just the token it wants to match against, you might need to use something like this at the end instead of
$:($|[^0-9a-f_])(More accurately, I guess it needs to check for$or any delimiter characters (like comma, close paren, space) that could terminate the literal, but it doesn't have to be perfect so I don't know if it's worth enumerating those.)
- Yes, the regular expressions Pygments uses for Opendylan are case insensitive (as seen in this line https://github.com/pygments/pygments/blob/edef94d66c2d70f05a86ac6098a69ab253b8d946/pygments/lexers/dylan.py#L33). Just yesterday I added a comment at the beginning saying that the regular expressions below are case insensitive, (sorry I should have put that earlier).
- The expression
($|[^0-9a-f_])is more precise, although looking at other examples of hexadecimal regular expressions it seems that most of them use$. I don't see any problem with using yours, which is more precise. I've added more test cases to see the difference.
Floating point number.
Our current RE [-+]?(\d*\.\d+(e[-+]?\d+)?|\d+(\.\d*)?e[-+]?\d+) does not match some of the literal tests suite: