scryer-prolog icon indicating copy to clipboard operation
scryer-prolog copied to clipboard

Add digit separator support for binary, octal, and hexadecimal numbers

Open jjtolton opened this issue 2 months ago • 18 comments

Summary

Implements digit separator support for binary (0b), octal (0o), and hexadecimal (0x) number literals, completing the digit separator feature that was previously only available for decimal numbers.

This addresses the backlog item mentioned in https://github.com/mthom/scryer-prolog/pull/3132#issuecomment-3435777584

Specification

Implements: https://www.complang.tuwien.ac.at/ulrich/iso-prolog/digit_separators

Changes

  • Lexer enhancements (src/parser/lexer.rs):
    • Added skip_underscore_in_hexadecimal() helper function
    • Added skip_underscore_in_octal() helper function
    • Added skip_underscore_in_binary() helper function
    • Updated hexadecimal_constant() to use digit separators
    • Updated octal_constant() to use digit separators
    • Updated binary_constant() to use digit separators

Supported Syntax

All numeric bases now support the following digit separator patterns:

  1. Single underscore: 0xDE_AD, 0o7_6, 0b10_11
  2. Multiple underscores: 0x1_2_3_4, 0o1_2_3, 0b1_0_1_0
  3. Underscore with whitespace: 0xFF_ 00, 0o77_ 00, 0b1111_ 0000
  4. Underscore with comments: 0xDE_ /* test */ AD, 0o1_ /* octal */ 23, 0b10_ /* binary */ 11

Tests

Comprehensive test coverage at three levels:

  1. Rust unit tests (src/tests/parse_tokens.rs): 16 tests

    • Test token-level parsing for all numeric bases
    • Verify all separator variations (basic, multiple, whitespace, comments)
  2. Prolog integration tests (src/tests/digit_separators.pl): 18 tests

    • Test actual Prolog execution with digit separators
    • Cover edge cases (large numbers, case insensitivity)
  3. CLI tests (tests/scryer/cli/src_tests/digit_separators.toml): 1 config

    • End-to-end command-line testing

All tests pass successfully.

Examples

?- X = 0xDE_AD, X =:= 57005.
   true.

?- X = 0b1111_0000, X =:= 240.
   true.

?- X = 0o77_00, X =:= 4032.
   true.

?- X = 0xFF_ /* comment */ 00, X =:= 65280.
   true.

Checklist

  • [x] Implementation follows existing code patterns
  • [x] Comprehensive test coverage (Rust unit, Prolog integration, CLI)
  • [x] All tests pass
  • [x] Based on current upstream/master
  • [x] Follows ISO Prolog digit separator specification

jjtolton avatar Oct 24 '25 04:10 jjtolton

Please compare this with #3125!

triska avatar Oct 24 '25 05:10 triska

Please compare this with #3125!

oh thats embarrassing 😰

jjtolton avatar Oct 24 '25 05:10 jjtolton

Not at all, this may still be useful for comparisons. For instance, to count the number of strings with length N that form a valid integer, and compare the two different implementations.

triska avatar Oct 24 '25 06:10 triska

Follows ISO Prolog digit separator specification

You mean WG17.

Did you update the decimal part as well? An inconsistency (in the specification) was only discovered this week. To be clear, it was there since the 2010-11-19 proposal of O'Keefe and nobody noted it in 15 years. As a consequence, also Scryer has the following inconsistency: It supports option 9 but neither 10 nor 11. And probably it is best to reject all three. The specifications so far supported 9 and 11, but not 10 which does not make much sense, since 10 is, if at all, the only one with many digits.

(I am not sure how to proceed, and which approach is preferable. What is indeed important is the willingness to actively maintain the improvements. Like in Trealla.)

(In Scryer, I did not test number_chars for a couple of years because partial strings at that time consumed global resources which made testing impractical. Only now with their heap representation can I continue testing)

UWN avatar Oct 24 '25 06:10 UWN

?- number_chars(N,"1_0.0").
   N = 10.0, unexpected.

Did you update the decimal part as well?

Not yet.

UWN avatar Oct 24 '25 16:10 UWN

g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -v
9a840154
g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -f
?- number_chars(N,"0xa").
   error(syntax_error(lexical_error),number_chars/2), unexpected.

UWN avatar Oct 24 '25 19:10 UWN

g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -v
9a840154
g1:/opt/gupu/jj-scryer-prolog> target/release/scryer-prolog -f
?- number_chars(N,"0xa").
   error(syntax_error(lexical_error),number_chars/2), unexpected.

cdd8f913c5bf65cc580401d0be7a851d8c029323 :+1:

jjtolton avatar Oct 24 '25 23:10 jjtolton

?- number_chars(N,"0_").
   N = 0, unexpected.
   syntax_error(...).

UWN avatar Oct 25 '25 06:10 UWN

(it would be much better, if there would be an issue-link.)

UWN avatar Oct 25 '25 06:10 UWN

?- number_chars(N,"0_ ").
   N = 0, unexpected.
   syntax_error(...).

For two reasons: It is not a valid number and it is followed by a layout at the end.

UWN avatar Oct 25 '25 07:10 UWN

@jjtolton, do you no longer continue this attempt? Note that for the moment there is no way to directly compare Scryer to the other implementations, that is Trealla and SWI.

UWN avatar Oct 26 '25 09:10 UWN

@UWN https://github.com/mthom/scryer-prolog/commit/cdd8f913c5bf65cc580401d0be7a851d8c029323

jjtolton avatar Oct 26 '25 15:10 jjtolton

I will check for some other cases that may not have been covered.

jjtolton avatar Oct 26 '25 15:10 jjtolton

Ultimately we should probably go with @bakaq's approach since it is more compact and intentional, this approach is nearly entirely procedurally generated. But I am happy to provide it as a reference.

jjtolton avatar Oct 26 '25 15:10 jjtolton

Additional test cases added: https://github.com/mthom/scryer-prolog/commit/cdd8f913c5bf65cc580401d0be7a851d8c029323#r168798457

jjtolton avatar Oct 26 '25 15:10 jjtolton

@jjtolton, @bakaq: We unexpectedly went from a what in the administration is called a positive competence conflict (two parties independently seeing themselves as responsible) to a negative competence conflict (no party feels responsible)?

Now both #3125 and #3134 are closed even though they attempted to solve the same problem?

triska avatar Oct 27 '25 17:10 triska

This was an unusual situation but for the sake of consistency I will reopen this PR because that is what @UWN ended up testing against. The tests for the combined branch are pretty good now and leave open the possibility for an improved implementation when time allows.

jjtolton avatar Oct 27 '25 18:10 jjtolton

Fixed #3159: Digit separators now parse correctly when followed by period. Commit: 36d989bf —J.J.'s robot.

jjtolton avatar Nov 20 '25 22:11 jjtolton