opendylan icon indicating copy to clipboard operation
opendylan copied to clipboard

doc: Update pygments syntax highlight for numbers

Open fraya opened this issue 10 months ago • 6 comments

This issue is not caused by the project but is related to it. We need to update the regular expressions used by the Pygments library to take into account the following updates to Opendylan:

  • DEP#11 which allows the use of underscore characters between any two digits.
  • Double float numbers (not considered until now).

The solution could be discussed here until a PR is proposed to the Pygments project.

Current regexp:

https://github.com/pygments/pygments/blob/edef94d66c2d70f05a86ac6098a69ab253b8d946/pygments/lexers/dylan.py#L140

The expressions below are case insensitive.

fraya avatar Feb 04 '25 07:02 fraya

Include _ in binary numbers:

#b[01]+(?:_[01]+)*

Matches are shown highlighted

Image

Or include white space or carriage return at the end

#b[01]+(?:_[01]+)*(\s+|\r)
#b[01]+(?:_[01]+)*$

Image

fraya avatar Feb 07 '25 10:02 fraya

Include _ in octal numbers (similar to binary):

#o[0-7]+(?:_[0-7]+)*

or

#o[0-7]+(?:_[0-7]+)*(\s+|\r)

fraya avatar Feb 07 '25 10:02 fraya

Hexadecimal numbers test cases:

#xff
#xdead_beef
#xdead_beef_
#xb_e_e_f
#xbe__ef
#x_beef_
#xh
#x + 1
#xff,#xff, #xff
(#xff)

#x[0-9a-f]+(?:_[0-9a-f]+)*$

Image

or

#x[0-9a-f]+(?:_[0-9a-f]+)*($|[^0-9a-f_])

Image

fraya avatar Feb 14 '25 18:02 fraya

Nice. I'm not sure how Pygments invokes the regex, but just in case:

  1. I assume it uses a case-insensitive test? It needs to work for #x, #X, and [A-F].
  2. If it gives you more text than just the token it wants to match against, you might need to use something like this at the end instead of $: ($|[^0-9a-f_]) (More accurately, I guess it needs to check for $ or any delimiter characters (like comma, close paren, space) that could terminate the literal, but it doesn't have to be perfect so I don't know if it's worth enumerating those.)

cgay avatar Feb 14 '25 18:02 cgay

  1. Yes, the regular expressions Pygments uses for Opendylan are case insensitive (as seen in this line https://github.com/pygments/pygments/blob/edef94d66c2d70f05a86ac6098a69ab253b8d946/pygments/lexers/dylan.py#L33). Just yesterday I added a comment at the beginning saying that the regular expressions below are case insensitive, (sorry I should have put that earlier).
  2. The expression ($|[^0-9a-f_]) is more precise, although looking at other examples of hexadecimal regular expressions it seems that most of them use $. I don't see any problem with using yours, which is more precise. I've added more test cases to see the difference.

fraya avatar Feb 15 '25 09:02 fraya

Floating point number. Our current RE [-+]?(\d*\.\d+(e[-+]?\d+)?|\d+(\.\d*)?e[-+]?\d+) does not match some of the literal tests suite:

Image

fraya avatar Mar 22 '25 10:03 fraya