scryer-prolog
scryer-prolog copied to clipboard
Faster integer formatting to arbitrary base
In one of my cryptographic applications of Scryer Prolog, the bottleneck turns out to be formatting large integers in base 16, using library(format).
We can use the ~Nr format control sequence to format an integer to base (radix) N, such as for example:
?- phrase(format_("~16r", [121212]), Cs).
Cs = "1d97c".
With larger integers, this is currently somewhat slow. For example, with the following definitions:
:- use_module(library(format)).
:- use_module(library(time)).
:- use_module(library(between)).
:- use_module(library(dcgs)).
exp(E) :-
N is 2^E,
between(1, N, _).
We have:
?- I is 2^2^8,
time((exp(12),phrase(format_("~16r", [I]), _),false)).
% CPU time: 1.766s, 8_413_227 inferences
false.
The formatting code is written in Prolog and should preferably remain in Prolog, because if every tiny detail is hardcoded in Rust itself, then there will never be enough incentive to make actual Prolog code faster:
https://github.com/mthom/scryer-prolog/blob/7b33eafbb052b8543942965f2967d64e53d918c3/src/lib/format.pl#L372
The question I have is therefore: What are the most fundamental, smallest building blocks that would help to make this Prolog code faster? Only that functionality should be implemented in Rust, if at all.
For example, currently, both I0 mod R and I0 // R are computed in the code:
https://github.com/mthom/scryer-prolog/blob/7b33eafbb052b8543942965f2967d64e53d918c3/src/lib/format.pl#L392
I see that the dashu crate exposes functionality to compute both results in a single operation:
https://docs.rs/dashu-int/latest/dashu_int/struct.IBig.html#impl-DivRem-for-IBig
Would such a predicate be a sensible addition to library(arithmetic), to obtain both the quotient and the remainder of two integers with better efficiency than is currently possible?
Any other ideas? I would greatly appreciate all suggestions and help with this issue. Thank you a lot!
Rust uses the ryu crate to print floating point numbers to strings quickly but I'm not sure if there is an equivalent for numbers. perhaps @cmpute could advise?
There are several ways to support fast printing in dashu:
- Fast formatting with base 2, 8, 10 and 16 is already supported through the conventional Rust protocal. (
Display,Debug, etc) - Fast formatting with arbitrary base (up to 32) is already supported through
dashu_int::IBig::in_radixmethod. - Fast conversion to arbitrary base digits is in dev plan (the API currently in my mind is like
dashu_int::IBig::to_digits(base: Word) -> Vec<Word>) - Fast division with constant divisor is already supported through the
dashu_int::fast_div::ConstDivisor
Not sure which approach is best suitable for Scryer Prolog. Let me know as well if you want different APIs :)
Thank you a lot @cmpute! I have published a draft (#2362) that makes the in_radix method available as an internal predicate. library(format) supports bases up to 36, so using this method would require adjustments in library(format): It would mean that the fast code can be used for most of the cases, but not for all of the cases that the library covers.
It may be interesting to have this functionality, for very large integers, if the need arises. I will first try to improve format_//2 in other ways that ideally only require changes in the engine that are as generally useful as possible.
Oh my bad, there was a typo, in_radix actually supports radix up to 36, which is in line with your need. Besides, whether to use upper case for the output is supported through the alternative flag (as documented here)
@cmpute: Thank you a lot for the clarification!
In my tests, the test case above can be made to run about 4 times faster by using the native in_radix method instead of the Prolog-based implementation. Personally, I think this speedup does not justify making this particular API available in Scryer, since it further increases the size of the Rust-based engine and also adds additional complexity to library(format) which should ideally remain portable between Prolog systems, and thus also needs to handle the situation that such a native method is not available in other Prolog systems.
I wonder whether adding a more general predicate to Scryer that also covers the functionality of the in_radix method is worth adding instead, for example, a predicate such as format_integer(FormatString, Integer). However, it is unclear how the specific functionality of the in_radix method could be best plugged into such a construct, and also how much "Rust-specifics" (such as the conventions of Rust format strings) should be exposed by the interface which should ideally be independent of the underlying implementation language.
The "arbitrary base digits" API mentioned above seems interesting, and may indeed be worth exposing as an interface predicate when it becomes available, especially if it satisfies these desiderata.
I will link this issue to the commit when the digits conversion functionality is added. However, it worth noting that this API won't get as much optimization as the small base conversions (at least not in the near future), i.e. the speed improvement can be smaller.