scryer-prolog icon indicating copy to clipboard operation
scryer-prolog copied to clipboard

Head unifications should construct compact strings when possible

Open triska opened this issue 7 months ago • 3 comments

The documentation of partial_string/3 states "It can be used as an optimized append/3.":

https://github.com/mthom/scryer-prolog/blob/f496abbbef33e974c596501ef835f9bb07585b22/src/lib/iso_ext.pl#L280

But why? Ideally, append/3 itself constructs a compact internal string representation.

When the engine sees the head:

append([A|As], Bs, [A|Cs]) :- ...

then this seems an opportunity to construct the third argument as a partial string under the right conditions.

This engine improvement would eliminate the need for partial_string/3, and also automatically benefit other predicates that construct strings.

triska avatar Apr 26 '25 10:04 triska

Just an implementation remark. Ideally, the following could be done:

If a WAM instruction just tries to write out [A|Cs] with A a character (in the determinate case), it checks whether the address of the variable to unify into is at the top of the heap. And if so, it looks whether the preceding cell contains (the end of) a string. In that case, we have a partial string already now to be extended. One can now extend that string destructively.

And if this is not possible one writes out a new partial string of length one, in the hope that the next inference will use it as a basis.

To make this work, all valid (non-string) cells need to be distinguished from cells with valid UTF-8 characters. This is essentially @bakaq 's suggested scheme now used here.

(Alternatively, a flag meaning if true "the topmost element on the heap is a partial string" and "unknown" otherwise)

UWN avatar Apr 26 '25 17:04 UWN

Thank you! It seems the flag can be implemented with comparatively little effort on top of what we have already now.

triska avatar Apr 26 '25 18:04 triska

Thank you! It seems the flag can be implemented with comparatively little effort on top of what we have already now.

True. And it only needs to be a safe approximation. No need to store it in a CP.

UWN avatar Apr 26 '25 18:04 UWN

Maybe the best would be instead of a flag a pointer to the zero of the last created partial string which has a variable in the next word on top of the stack. Thus, if that pointer is non-NULL it points to the precise place (and can be safety-checked to be next to the top, when debugged).

UWN avatar Oct 17 '25 17:10 UWN