Add digit separator support for binary, octal, and hexadecimal numbers
Summary
Implements digit separator support for binary (0b), octal (0o), and hexadecimal (0x) number literals, completing the digit separator feature that was previously only available for decimal numbers.
This addresses the backlog item mentioned in https://github.com/mthom/scryer-prolog/pull/3132#issuecomment-3435777584
Specification
Implements: https://www.complang.tuwien.ac.at/ulrich/iso-prolog/digit_separators
Changes
- Lexer enhancements (
src/parser/lexer.rs):- Added
skip_underscore_in_hexadecimal()helper function - Added
skip_underscore_in_octal()helper function - Added
skip_underscore_in_binary()helper function - Updated
hexadecimal_constant()to use digit separators - Updated
octal_constant()to use digit separators - Updated
binary_constant()to use digit separators
- Added
Supported Syntax
All numeric bases now support the following digit separator patterns:
- Single underscore:
0xDE_AD,0o7_6,0b10_11 - Multiple underscores:
0x1_2_3_4,0o1_2_3,0b1_0_1_0 - Underscore with whitespace:
0xFF_ 00,0o77_ 00,0b1111_ 0000 - Underscore with comments:
0xDE_ /* test */ AD,0o1_ /* octal */ 23,0b10_ /* binary */ 11
Tests
Comprehensive test coverage at three levels:
-
Rust unit tests (
src/tests/parse_tokens.rs): 16 tests- Test token-level parsing for all numeric bases
- Verify all separator variations (basic, multiple, whitespace, comments)
-
Prolog integration tests (
src/tests/digit_separators.pl): 18 tests- Test actual Prolog execution with digit separators
- Cover edge cases (large numbers, case insensitivity)
-
CLI tests (
tests/scryer/cli/src_tests/digit_separators.toml): 1 config- End-to-end command-line testing
All tests pass successfully.
Examples
?- X = 0xDE_AD, X =:= 57005.
true.
?- X = 0b1111_0000, X =:= 240.
true.
?- X = 0o77_00, X =:= 4032.
true.
?- X = 0xFF_ /* comment */ 00, X =:= 65280.
true.
Checklist
- [x] Implementation follows existing code patterns
- [x] Comprehensive test coverage (Rust unit, Prolog integration, CLI)
- [x] All tests pass
- [x] Based on current upstream/master
- [x] Follows ISO Prolog digit separator specification
Please compare this with #3125!
Please compare this with #3125!
oh thats embarrassing 😰
Not at all, this may still be useful for comparisons. For instance, to count the number of strings with length N that form a valid integer, and compare the two different implementations.
Follows ISO Prolog digit separator specification
You mean WG17.
Did you update the decimal part as well? An inconsistency (in the specification) was only discovered this week. To be clear, it was there since the 2010-11-19 proposal of O'Keefe and nobody noted it in 15 years. As a consequence, also Scryer has the following inconsistency: It supports option 9 but neither 10 nor 11. And probably it is best to reject all three. The specifications so far supported 9 and 11, but not 10 which does not make much sense, since 10 is, if at all, the only one with many digits.
(I am not sure how to proceed, and which approach is preferable. What is indeed important is the willingness to actively maintain the improvements. Like in Trealla.)
(In Scryer, I did not test number_chars for a couple of years because partial strings at that time consumed global resources which made testing impractical. Only now with their heap representation can I continue testing)
?- number_chars(N,"1_0.0").
N = 10.0, unexpected.
Did you update the decimal part as well?
Not yet.
g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -v
9a840154
g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -f
?- number_chars(N,"0xa").
error(syntax_error(lexical_error),number_chars/2), unexpected.
g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -v 9a840154 g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -f ?- number_chars(N,"0xa"). error(syntax_error(lexical_error),number_chars/2), unexpected.
cdd8f913c5bf65cc580401d0be7a851d8c029323 :+1:
?- number_chars(N,"0_").
N = 0, unexpected.
syntax_error(...).
(it would be much better, if there would be an issue-link.)
?- number_chars(N,"0_ ").
N = 0, unexpected.
syntax_error(...).
For two reasons: It is not a valid number and it is followed by a layout at the end.
@jjtolton, do you no longer continue this attempt? Note that for the moment there is no way to directly compare Scryer to the other implementations, that is Trealla and SWI.
@UWN https://github.com/mthom/scryer-prolog/commit/cdd8f913c5bf65cc580401d0be7a851d8c029323
I will check for some other cases that may not have been covered.
Ultimately we should probably go with @bakaq's approach since it is more compact and intentional, this approach is nearly entirely procedurally generated. But I am happy to provide it as a reference.
Additional test cases added: https://github.com/mthom/scryer-prolog/commit/cdd8f913c5bf65cc580401d0be7a851d8c029323#r168798457
@jjtolton, @bakaq: We unexpectedly went from a what in the administration is called a positive competence conflict (two parties independently seeing themselves as responsible) to a negative competence conflict (no party feels responsible)?
Now both #3125 and #3134 are closed even though they attempted to solve the same problem?
This was an unusual situation but for the sake of consistency I will reopen this PR because that is what @UWN ended up testing against. The tests for the combined branch are pretty good now and leave open the possibility for an improved implementation when time allows.
Fixed #3159: Digit separators now parse correctly when followed by period. Commit: 36d989bf —J.J.'s robot.