perl5
perl5 copied to clipboard
Add OPpTARGET_MY optimization to OP_UNDEF
This allows the existing undef OP to act on a pad SV. The following
two cases are optimized:
undef my $x, currently implemented as:
4 <1> undef vK/1 ->5
3 <0> padsv[$x:1,2] sRM/LVINTRO ->4
my $a = undef, currently implemented as:
5 <2> sassign vKS/2 ->6
3 <0> undef s ->4
4 <0> padsv[$x:1,2] sRM*/LVINTRO ->5
These are now just represented as:
3 <1> undef[$x:1,2] vK/LVINTRO,TARGMY ->4
The undef $x case gets a slight performance boost, as shown in this toy example:
my $x; for (0..10_000_000) { undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x; undef $x }
Blead:
714.52 msec task-clock # 0.956 CPUs utilized
1 context-switches # 0.001 K/sec
0 cpu-migrations # 0.000 K/sec
198 page-faults # 0.277 K/sec
3,170,725,221 cycles # 4.438 GHz
5,948,953 stalled-cycles-frontend # 0.19% frontend cycles idle
231,911 stalled-cycles-backend # 0.01% backend cycles idle
10,843,683,798 instructions # 3.42 insn per cycle
# 0.00 stalled cycles per insn
2,260,727,959 branches # 3163.998 M/sec
Patched:
602.19 msec task-clock # 0.974 CPUs utilized
1 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
199 page-faults # 0.330 K/sec
2,724,503,940 cycles # 4.524 GHz
313,679 stalled-cycles-frontend # 0.01% frontend cycles idle
191,871 stalled-cycles-backend # 0.01% backend cycles idle
8,943,649,796 instructions # 3.28 insn per cycle
# 0.00 stalled cycles per insn
1,760,721,729 branches # 2923.875 M/sec
The $x = undef case, in which more optimization is achieved, performs much better, as shown in this toy example:
my $x; for (0..10_000_000) { $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef; $x = undef }
Blead:
1,224.48 msec task-clock # 0.990 CPUs utilized
2 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
205 page-faults # 0.167 K/sec
5,663,539,065 cycles # 4.625 GHz
4,336,701 stalled-cycles-frontend # 0.08% frontend cycles idle
4,385 stalled-cycles-backend # 0.00% backend cycles idle
19,644,168,889 instructions # 3.47 insn per cycle
# 0.00 stalled cycles per insn
3,760,824,142 branches # 3071.356 M/sec
Patched:
628.51 msec task-clock # 0.974 CPUs utilized
1 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
200 page-faults # 0.318 K/sec
2,716,312,069 cycles # 4.322 GHz
432,725 stalled-cycles-frontend # 0.02% frontend cycles idle
224,782 stalled-cycles-backend # 0.01% backend cycles idle
8,943,691,044 instructions # 3.29 insn per cycle
# 0.00 stalled cycles per insn
1,760,729,375 branches # 2801.443 M/sec
Also some bench.pl comparisons:
expr::sassign::undef_lex
$x = undef
blead undef
------ ------
Ir 100.00 318.00
Dr 100.00 317.65
Dw 100.00 412.50
COND 100.00 300.00
IND 100.00 150.00
COND_m 100.00 100.00
IND_m 100.00 200.00
Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00
Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00
expr::sassign::undef_lex_direc
undef $x
blead undef
------ ------
Ir 100.00 142.00
Dr 100.00 158.82
Dw 100.00 162.50
COND 100.00 142.86
IND 100.00 150.00
COND_m 100.00 100.00
IND_m 100.00 150.00
Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00
Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00
expr::sassign::undef_my_lex
my $x = undef
blead undef
------ ------
Ir 100.00 164.50
Dr 100.00 180.85
Dw 100.00 196.15
COND 100.00 173.68
IND 100.00 133.33
COND_m 100.00 100.00
IND_m 100.00 200.00
Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00
Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00
expr::sassign::undef_my_lex_direc
undef my $x
blead undef
------ ------
Ir 100.00 112.43
Dr 100.00 123.40
Dw 100.00 119.23
COND 100.00 115.79
IND 100.00 133.33
COND_m 100.00 100.00
IND_m 100.00 150.00
Ir_m1 100.00 100.00
Dr_m1 100.00 100.00
Dw_m1 100.00 100.00
Ir_mm 100.00 100.00
Dr_mm 100.00 100.00
Dw_mm 100.00 100.00