scryer-prolog icon indicating copy to clipboard operation
scryer-prolog copied to clipboard

rebis-dev: Unexpected performance regression

Open triska opened this issue 8 months ago • 3 comments

For the following program:

:- use_module(library(lists)).

p(As) :-
        maplist(=(a), As).

We have with master:

?- wam_instructions(p/1, Is),
   maplist(portray_clause, Is).
get_variable(x(2),1).
put_structure(=,1,x(1)).
set_constant(a).
set_constant('$index_ptr'(43)).
execute(maplist,2).

Whereas with rebis-dev:

set_constant('$index_ptr'(43)).
get_variable(x(2),1).
put_structure(=,1,x(1)).
set_constant(a).
execute(maplist,2).

So, unexpectedly no precompiled goal with rebis-dev?

Hence

a significant timing difference between master and rebis-dev:

?- length(Ls, 1_000_000),
   time(maplist(=(a), Ls)).

With master: % CPU time: 0.201s, 2_000_033 inferences With rebis-dev: % CPU time: 0.378s, 2_000_034 inferences

This is the reason for https://github.com/mthom/scryer-prolog/issues/2907#issuecomment-2825531828.

triska avatar Apr 23 '25 21:04 triska

But it is inlined. The set_constant('$index_ptr'(43)) is before the put_structure instruction (index cells now come before structures in rebis-dev, not after like in master). I've checked in gdb and can confirm the inlined index is used by fast_call. So the slowdown is for some other reason.

mthom avatar Apr 24 '25 03:04 mthom

Profiling with perf, it seems to be the new OffsetTableImpl hosting CodeIndex, which is behind RcuRef, that is responsible for the overhead.

I wonder if it would make sense to swap table types behind a boxed trait object based on the number of active threads, so that the tables are only wrapped in RcuRef when more than one Scryer VM thread is active.

mthom avatar Apr 24 '25 06:04 mthom

Good news: 60695e8cf84be5a10cdec9f6c7333dc8aaadb0e6 is even faster than master for this benchmark:

?- length(Ls, 1_000_000),
   time(maplist(=(a), Ls)).
   % CPU time: 0.230s, 2_000_034 inferences
   Ls = "aaaaaaaaaaaaaaaaaaa ...".

whereas with master from around the time I originally filed the issue:

?- length(Ls, 1_000_000),
   time(maplist(=(a), Ls)).
   % CPU time: 0.268s, 2_000_033 inferences
   Ls = "aaaaaaaaaaaaaaaaaaa ...".

triska avatar Jun 22 '25 08:06 triska

Awesome, thank you a lot!

triska avatar Jun 27 '25 16:06 triska