perl5
perl5 copied to clipboard
Implement OP_PADSV_STORE - combined sassign/padsv OP
This commit introduces a new OP to replace simple cases of OP_SASSIGN and OP_PADSV.
For example, my $x = 1 is currently implemented as:
1 <;> nextstate(main 1 -e:1) v:{
2 <$> const(IV 1) s
3 <0> padsv[$x:1,2] sRM*/LVINTRO
4 <2> sassign vKS/2
But now will be turned into:
1 <;> nextstate(main 1 -e:1) v:{
2 <$> const(IV 1) s
3 <1> padsv_store[$x:1,2] vKMS/LVINTRO
This intended to be a transparent performance optimization. It should be applicable for RHS optrees of varying complexity.
Notes:
- This optimization was suggested by @leonerd as a good project for getting feet wet.
- This was my first foray into Deparse.pm; no idea if this was the most appropriate way of supporting this new op.
- Ideas for separate optimizations of simple assignments of CONSTs, PADSVs, and UNDEF into single ops are already in the works.
Performance: The following unrealistic code shows the op-related speedup:
my ($l0, $l1, $l2, $l3, $l3, $l5, $l6, $l7, $l7, $l9);
for (0..10_000_000) {
$l0 = $l1 = $l2 = $l3 = $l4 = $l5 = $l6 = $l7 = $l8 = $l9 = 1
}
Blead:
870.42 msec task-clock # 0.985 CPUs utilized
1 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
202 page-faults # 0.232 K/sec
3,800,528,668 cycles # 4.366 GHz
79,206,368 stalled-cycles-frontend # 2.08% frontend cycles idle
217,927 stalled-cycles-backend # 0.01% backend cycles idle
15,523,885,258 instructions # 4.08 insn per cycle
# 0.01 stalled cycles per insn
3,030,769,392 branches # 3481.972 M/sec
0 branch-misses # 0.00% of all branches
0.883580599 seconds time elapsed
Patched:
721.04 msec task-clock # 0.983 CPUs utilized
1 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
197 page-faults # 0.273 K/sec
3,163,626,846 cycles # 4.388 GHz
51,457,402 stalled-cycles-frontend # 1.63% frontend cycles idle
30 stalled-cycles-backend # 0.00% backend cycles idle
12,563,719,698 instructions # 3.97 insn per cycle
# 0.00 stalled cycles per insn
2,470,736,436 branches # 3426.638 M/sec
0 branch-misses # 0.00% of all branches
0.733857229 seconds time elapsed
The extent to which users will notice a benefit in real code will depend on things like how often the optimized op is used and what path is taken through Perl_sv_setsv.
Any further comments on this one, or does it look good to merge?
Thanks for spotting that. I'll take a look later today. Might not be straightforward - regen complains that:
addbits(): bit 4 of padsv_store already specified (OPpTARGET_MY) at ./regen/op_private line 417.
(I can't remember offhand if B::Deparse needs the OPpTARGET_MY flag.)
Assuming all CI tests pass, OP_PADSV_CONST doesn't seem to need the OPpTARGET_MY flag, so I've taken that out and added in the OPpDEREF flag bits.
After working on #20036 and looking a bit more at this PR, I realised that the basic optimization will not need to support OPpDEREF and have removed that branch from pp_padsv_store.
Note: It also seemed simple enough to add support for assigning simple padsv-padsv-sassign optrees, which I originally did. However, this involved turning OP_PADSV_STORE into an UNOP_aux and stashing the second padoffset in the op_aux slot. That worked fine, but there was no way to make Deparse.pm work (short of hacking around in B.xs, which definitely seems like a bad move.) So I dropped that commit.
Hopefully, UNOPs (and possibly other op types) will soon get an op_src member, at which point the padsv-padsv-sassign commit can be tweaked to use it and submitted as a separate PR.