scryer-prolog
scryer-prolog copied to clipboard
Head unifications should construct compact strings when possible
The documentation of partial_string/3 states "It can be used as an optimized append/3.":
https://github.com/mthom/scryer-prolog/blob/f496abbbef33e974c596501ef835f9bb07585b22/src/lib/iso_ext.pl#L280
But why? Ideally, append/3 itself constructs a compact internal string representation.
When the engine sees the head:
append([A|As], Bs, [A|Cs]) :- ...
then this seems an opportunity to construct the third argument as a partial string under the right conditions.
This engine improvement would eliminate the need for partial_string/3, and also automatically benefit other predicates that construct strings.
Just an implementation remark. Ideally, the following could be done:
If a WAM instruction just tries to write out [A|Cs] with A a character (in the determinate case), it checks whether the address of the variable to unify into is at the top of the heap. And if so, it looks whether the preceding cell contains (the end of) a string. In that case, we have a partial string already now to be extended. One can now extend that string destructively.
And if this is not possible one writes out a new partial string of length one, in the hope that the next inference will use it as a basis.
To make this work, all valid (non-string) cells need to be distinguished from cells with valid UTF-8 characters. This is essentially @bakaq 's suggested scheme now used here.
(Alternatively, a flag meaning if true "the topmost element on the heap is a partial string" and "unknown" otherwise)
Thank you! It seems the flag can be implemented with comparatively little effort on top of what we have already now.
Thank you! It seems the flag can be implemented with comparatively little effort on top of what we have already now.
True. And it only needs to be a safe approximation. No need to store it in a CP.
Maybe the best would be instead of a flag a pointer to the zero of the last created partial string which has a variable in the next word on top of the stack. Thus, if that pointer is non-NULL it points to the precise place (and can be safety-checked to be next to the top, when debugged).