perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

Implement OP_PADSV_STORE - combined sassign/padsv OP

Open richardleach opened this issue 3 years ago • 4 comments

This commit introduces a new OP to replace simple cases of OP_SASSIGN and OP_PADSV.

For example, my $x = 1 is currently implemented as:

1  <;> nextstate(main 1 -e:1) v:{
2  <$> const(IV 1) s
3  <0> padsv[$x:1,2] sRM*/LVINTRO
4  <2> sassign vKS/2

But now will be turned into:

1  <;> nextstate(main 1 -e:1) v:{
2  <$> const(IV 1) s
3  <1> padsv_store[$x:1,2] vKMS/LVINTRO

This intended to be a transparent performance optimization. It should be applicable for RHS optrees of varying complexity.

Notes:

  • This optimization was suggested by @leonerd as a good project for getting feet wet.
  • This was my first foray into Deparse.pm; no idea if this was the most appropriate way of supporting this new op.
  • Ideas for separate optimizations of simple assignments of CONSTs, PADSVs, and UNDEF into single ops are already in the works.

Performance: The following unrealistic code shows the op-related speedup:

my ($l0, $l1, $l2, $l3, $l3, $l5, $l6, $l7, $l7, $l9);
for (0..10_000_000) {
    $l0 = $l1 = $l2 = $l3 = $l4 = $l5 = $l6 = $l7 = $l8 = $l9 = 1 
}

Blead:

            870.42 msec task-clock                #    0.985 CPUs utilized          
                 1      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               202      page-faults               #    0.232 K/sec                  
     3,800,528,668      cycles                    #    4.366 GHz                    
        79,206,368      stalled-cycles-frontend   #    2.08% frontend cycles idle   
           217,927      stalled-cycles-backend    #    0.01% backend cycles idle    
    15,523,885,258      instructions              #    4.08  insn per cycle         
                                                  #    0.01  stalled cycles per insn
     3,030,769,392      branches                  # 3481.972 M/sec                  
                 0      branch-misses             #    0.00% of all branches        

       0.883580599 seconds time elapsed

Patched:

            721.04 msec task-clock                #    0.983 CPUs utilized          
                 1      context-switches          #    0.001 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               197      page-faults               #    0.273 K/sec                  
     3,163,626,846      cycles                    #    4.388 GHz                    
        51,457,402      stalled-cycles-frontend   #    1.63% frontend cycles idle   
                30      stalled-cycles-backend    #    0.00% backend cycles idle    
    12,563,719,698      instructions              #    3.97  insn per cycle         
                                                  #    0.00  stalled cycles per insn
     2,470,736,436      branches                  # 3426.638 M/sec                  
                 0      branch-misses             #    0.00% of all branches        

       0.733857229 seconds time elapsed

The extent to which users will notice a benefit in real code will depend on things like how often the optimized op is used and what path is taken through Perl_sv_setsv.

richardleach avatar Jul 11 '22 21:07 richardleach

Any further comments on this one, or does it look good to merge?

richardleach avatar Jul 31 '22 11:07 richardleach

Thanks for spotting that. I'll take a look later today. Might not be straightforward - regen complains that: addbits(): bit 4 of padsv_store already specified (OPpTARGET_MY) at ./regen/op_private line 417.

(I can't remember offhand if B::Deparse needs the OPpTARGET_MY flag.)

richardleach avatar Aug 01 '22 16:08 richardleach

Assuming all CI tests pass, OP_PADSV_CONST doesn't seem to need the OPpTARGET_MY flag, so I've taken that out and added in the OPpDEREF flag bits.

richardleach avatar Aug 01 '22 18:08 richardleach

After working on #20036 and looking a bit more at this PR, I realised that the basic optimization will not need to support OPpDEREF and have removed that branch from pp_padsv_store.

Note: It also seemed simple enough to add support for assigning simple padsv-padsv-sassign optrees, which I originally did. However, this involved turning OP_PADSV_STORE into an UNOP_aux and stashing the second padoffset in the op_aux slot. That worked fine, but there was no way to make Deparse.pm work (short of hacking around in B.xs, which definitely seems like a bad move.) So I dropped that commit.

Hopefully, UNOPs (and possibly other op types) will soon get an op_src member, at which point the padsv-padsv-sassign commit can be tweaked to use it and submitted as a separate PR.

richardleach avatar Aug 06 '22 15:08 richardleach