perl5 Create CvSUBOVERRIDE to speed up common sub idioms

This new code path does two separate things designed to allow common function patterns to be replaced with a static C function at runtime:

First it adds two additional fields to CV and a check inside pp_entersub that, if set, will call a C function instead of setting up a perl stack frame and calling the original function. This call can return false to indicate it was unsuccessful and allow the original optree to operate as normal.

Second it adds an initial optimization that identifies a common accessor method pattern and attaches them with a static hash-key lookup function. The optimization is smart enough to handle cases when the function is also a mutator and passes control back to the optree to be executed.

This combination enables common OO-style accessor to return as quick as an XS version is able to.

A compiler flag, -Accflags=-DPERL_CV_OVERRIDE, must be given to enable the feature; both the check inside pp_entersub as well as initial optimization.

I have opened this as draft PR, I believe the concept in general is technically viable but I am unsure if it is a good fit. I can appreciate that as a new contributor this is touching some important bits, but I had an interest in learning about perl's internals and had this as a specific goal in mind. I had previously opened an earlier draft PR with a much different in-progress version; the feedback I received from that significantly changed the final approach.

I benchmarked the change with Porting/bench.pl, which my interpretation of the information indicates that on average the call:: benchmarks are about 2% slower overall, but the new call::sub::accessor benchmark is about 220% faster. I understand that making every call slower, even if by a small amount, is not going to work with every workload, and so I decided to put the optimization behind a compiler flag.

I did another benchmark by creating a script to compare several popular object systems. The results of those appear to show that this optimization allows accessors that use the optimized pattern to be within the range of speed of some popular XS accessor modules.

Both the script I created above as well as my output from Porting/bench.pl is available in the below gist. In both cases blead refers to a compiled version of perl without -Accflags=-DPERL_CV_OVERRIDE, and hacked refers to a compiled version of perl with that flag defined. https://gist.github.com/atrodo/98f6cde36c8fa9954de34957be876493

Jan 12 '24 04:01 atrodo

Two general comments. First, AFAIKT, nowhere do you state what actual "common accessor method pattern"(s) it overrides. Second, You have a couple of non-trivial C functions with no code comments at the top to say what they do.

Then a specific observation: I doubt that this code will work correctly under PERL_RC_STACK. As a general rule, no new code in core should include dSP and all the macros associated with it. See the new section "Reference-counted argument stack" in perlguts.pod.

I haven't understood the PR well enough yet (see the the two general comments above) to form an opinion on it.

Do you have any firm ideas/plans for further common method patterns to override?

Jan 12 '24 13:01 iabyn

Plus my overall thinking is that this is quite a large body of work to surprise on folks from a new contributor. I would have much preferred to have some initial discussion on the aims and directions first, so we can give some more specific advice on how best to go about doing it.

Jan 12 '24 13:01 leonerd

Some further observations:

Since part of this commit is to provide a new facility to effectively add a "lightweight XS" function to a CV that is allowed to bail out early and pass control back to the "slow" code path if its a set-up it can't handle, then this facility needs to be properly documented. In particular, it needs to make clear what normal XS functionality is being skipped and needs to be handled by the function itself if necessary (such as not replacing PADTMP arguments with copies). It also needs to make clear that if the function returns false, the stack etc should be left in its original state, so that the normal perl op code path can still be performed.
Note that S_defgv_hek_accessor() appears to violate that - the (!isGV_with_GP) test returns false, but by then the stack has already been popped.
The SV stored in CvSUBOVERRIDEAUX(cv) doesn't appear to be freed when the CV is freed.

Jan 12 '24 16:01 iabyn

First, AFAIKT, nowhere do you state what actual "common accessor method pattern"(s) it overrides.

I apologize, you are correct, I did not state it explicitly. It is hidden away in the S_cv_override_create function, and I should have brought the examples to the forefront. The common patterns this is able to handle are:

sub foo { return $_[0]->{foo} } sub foo { @_ > 1 && ...; $_[0]->{foo} } sub foo { @_ != 1 && ...; $_[0]->{foo} } sub foo { return $_[0]->{foo} if @_ == 1 }

Second, You have a couple of non-trivial C functions with no code comments at the top to say what they do.

I will plan to add those, it was simply an oversight on my part. The quick summary of the two functions are:

S_defgv_hek_accessor: This is the function that runs at runtime attempting to optimize an accessor pattern. It will check that the number of arguments to a function call is exactly 1, and confirms that the single item is a hashref. If so, it returns the hash value of a static key stored in CvSUBOVERRIDEAUX.

S_cv_override_create: This is the optimizer step that looks at the function as a whole and determines if an optimzation can be made. If so, it enables CvIsSUBOVERRIDE, sets the function to call to S_defgv_hek_accessor, and saves the constant key.

Then a specific observation: I doubt that this code will work correctly under PERL_RC_STACK. As a general rule, no new code in core should include dSP and all the macros associated with it. See the new section "Reference-counted argument stack" in perlguts.pod.

That's very possible. Some time ago I did attempt to compile with PERL_RC_STACK and PERL_CV_OVERRIDE, and my memory is that nothing immediately failed. I will make sure to check again and work on adjusting it to take care of the related macros before taking the draft status off this PR.

Do you have any firm ideas/plans for further common method patterns to override?

My goal initially was to support how OO framework generate their accessors, and the patterns above covers some of the more common frameworks, Moo and Moose specifically.

I have thought about what else it could be used for, and I'm not going to say all of those ideas are valuable, just that they are simple patterns that I've used that maybe don't require a full stack to be created in order to function in all cases.

Accessors that access a pad value instead of a constant { my $k = 'foo'; sub foo { return $_[0]->{$k} } }
Accessors that do a little more checking, i.e. sub foo { die "..." if ref $_[0] ne "..."; return $_[0]->{foo}; }
Simple mutators of the form sub foo { exists $_[1] && $_[0]->{foo} = $_[1] }
Functions guarded by a global or local variable, i.e. sub info { return unless $FOO_INFO; ... }
Simple comparison sort functions, i.e. sub sort { $a cmp $b }
Simple check functions, like sub iscoderef { defined $_[0] && ref $_[0] eq 'CODE' }
Empty functions being called on an object i.e. sub foo {}
Simple array/hash creation functions, i.e. sub foo { return { foo => { @_ } } }

The strength of the way it works currently is that the static function doesn't have to handle the entire body of the function; it can see that the conditional it is optimizing for is not true and allow the optree to take over. So in the case of return unless $FOO, it can do the global lookup to see if the stack and optree need to run at all.

This PR is the result of an exercise in digging deeper into the perl internals, something I feel has been a good exercise for me, personally. I opened this PR because I had this code complete and seems from a surface level to be an improvement for at least some workloads. I opened it as a draft PR specifically to check to see if the changes were something others would be interested in. If this is the type of optimization that doesn't fit in the project, I am happy to close the PR and move on to something else. If the idea has legs, I'm happy to do any changes in comments, style or methodology to bring it up to par with expectations.

Jan 12 '24 17:01 atrodo

Some further observations:

1. Since part of this commit is to provide a new facility to effectively add a "lightweight XS" function to a CV that is allowed to bail out early and pass control back to the "slow" code path if its a set-up it can't handle, then this facility needs to be properly documented. In particular, it needs to make clear what normal XS functionality is being skipped and needs to be handled by the function itself if necessary (such  as not replacing PADTMP arguments with copies). It also needs to make clear that if the function returns false, the stack etc should be left in its original state, so that the normal perl op code path can still be performed.

That is a good description of what it's doing. I plan to add documentation about it.

2. Note that S_defgv_hek_accessor() appears to violate that - the (!isGV_with_GP) test returns false, but by then the stack has already been popped.

That section is copied from do_HV_rv2hv_helem, so I was unaware of those side effects. I will dig into what its actually doing and adjust that conditional.

3. The SV stored in CvSUBOVERRIDEAUX(cv) doesn't appear to be freed when the CV is freed.

Had not thought about a CV getting freed. I will see what I can do.

Jan 12 '24 18:01 atrodo

sub foo { return $_[0]->{foo} }

A similar one that I have often seen is { shift->{foo} }, and in all cases with or without an explicit return.

Jan 12 '24 23:01 hvds

On Fri, Jan 12, 2024 at 10:06:25AM -0800, Jon Gentle wrote:

3. The SV stored in CvSUBOVERRIDEAUX(cv) doesn't appear to be freed when the CV is freed.
Had not thought about a CV getting freed. I will see what I can do.

And I've just remembered: in addition, the pointer to the SV needs copying (and the SV's ref count incremented) each time the CV is cloned by S_cv_clone() - cloning being making a new anon sub at runtime by copying from a sub prototype, as used in 'my $s = sub { ....}'.

You also need to duplicate the SV (in the SVt_PVCV/SVt_PVFM case in S_sv_dup_common()) when the CV is being duplicated, which is what happens when a new ithread is being created.

It would also be worthwhile adding tests which fail without these fixups. t/op/svleak.t is a good place to detect the SV not being freed. In the eleak() loop, you could both declare a named sub and then delete it from the symbol table.

Not being cloned might be harder to test: anon CVs are only cloned when they are acting as a closure, i.e. they reference a lexical var in an outer scope, and I guess in that case it wouldn't be a candidate for CvSUBOVERRIDE() anyway?

In t/op/threads.t: in a child thread you could call the accessor method and see if it returns the right hash element.

-- A walk of a thousand miles begins with a single step... then continues for another 1,999,999 or so.

Jan 13 '24 13:01 iabyn

perl5 perl5 copied to clipboard

Create CvSUBOVERRIDE to speed up common sub idioms

perl5
perl5 copied to clipboard