rebis-dev: Unexpected performance regression
For the following program:
:- use_module(library(lists)).
p(As) :-
maplist(=(a), As).
We have with master:
?- wam_instructions(p/1, Is),
maplist(portray_clause, Is).
get_variable(x(2),1).
put_structure(=,1,x(1)).
set_constant(a).
set_constant('$index_ptr'(43)).
execute(maplist,2).
Whereas with rebis-dev:
set_constant('$index_ptr'(43)).
get_variable(x(2),1).
put_structure(=,1,x(1)).
set_constant(a).
execute(maplist,2).
So, unexpectedly no precompiled goal with rebis-dev?
Hence
a significant timing difference between master and rebis-dev:
?- length(Ls, 1_000_000), time(maplist(=(a), Ls)).
With master: % CPU time: 0.201s, 2_000_033 inferences
With rebis-dev: % CPU time: 0.378s, 2_000_034 inferences
This is the reason for https://github.com/mthom/scryer-prolog/issues/2907#issuecomment-2825531828.
But it is inlined. The set_constant('$index_ptr'(43)) is before the put_structure instruction (index cells now come before structures in rebis-dev, not after like in master). I've checked in gdb and can confirm the inlined index is used by fast_call. So the slowdown is for some other reason.
Profiling with perf, it seems to be the new OffsetTableImpl hosting CodeIndex, which is behind RcuRef, that is responsible for the overhead.
I wonder if it would make sense to swap table types behind a boxed trait object based on the number of active threads, so that the tables are only wrapped in RcuRef when more than one Scryer VM thread is active.
Good news: 60695e8cf84be5a10cdec9f6c7333dc8aaadb0e6 is even faster than master for this benchmark:
?- length(Ls, 1_000_000), time(maplist(=(a), Ls)). % CPU time: 0.230s, 2_000_034 inferences Ls = "aaaaaaaaaaaaaaaaaaa ...".
whereas with master from around the time I originally filed the issue:
?- length(Ls, 1_000_000), time(maplist(=(a), Ls)). % CPU time: 0.268s, 2_000_033 inferences Ls = "aaaaaaaaaaaaaaaaaaa ...".
Awesome, thank you a lot!