perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

proof of concept/performance test for use float

Open tonycoz opened this issue 4 years ago • 9 comments

This is an attempt at #17813

I tested performance with a simple mandelbrot set generator (on my old CPU):

tony@mars:.../git/perl2$ time ./perl -Ilib ../mandel.pl

real    0m25.752s
user    0m25.612s
sys     0m0.132s
tony@mars:.../git/perl2$ time ./perl -Ilib -Mfeature=float ../mandel.pl

real    0m19.751s
user    0m19.742s
sys     0m0.004s

tonycoz avatar Jun 02 '20 06:06 tonycoz

I see two upvotes - did anyone else try benchmarking this on more useful code?

I ask to see if it's worth developing this further.

I implemented this as a feature, but it doesn't really belong there since it's not a language feature as such, it shouldn't be enabled by a feature version bundle.

I'm hesitant to use a hints bit since we're fairly short on them.

Simply using an entry in %^H has the same problems that it did for indirect feature before features were cached in cop_features - we'd be adding a hash lookup for every binop or unop generated.

Maybe it could be implemented as a feature, but not included in the all feature set, and not documented in feature.pm.

tonycoz avatar Aug 03 '20 06:08 tonycoz

This still seems worthwhile to me, but non of my useful code really uses float math so nothing handy to benchmark.

richardleach avatar Aug 05 '20 07:08 richardleach

note that we recovered some hint bits with 5d1739474d967de1ab8a8f88aa5eff250dbc0eab so maybe it s fine to steal one bit for float?

I've not tested/benchmarked this on other code.

atoomic avatar Aug 05 '20 17:08 atoomic

note that we recovered some hint bits with 5d17394 so maybe it s fine to steal one bit for float?

That recovered only a single bit which is now assigned to the feature mask, where it belongs.

Maybe we just need another hints word.

tonycoz avatar Sep 15 '20 05:09 tonycoz

@tonycoz , @richardleach , @atoomic, Can we get an update on the status of this p.r.?

Thank you very much. Jim Keenan

jkeenan avatar Jan 26 '21 01:01 jkeenan

It's waiting on (likely) adding another hints word.

But I think that needs to wait on reducing the cost of COPs which those are embedded into.

Right now a COP is generated for every statement, but the information in each COP typically doesn't change much except for the line number. I've looked at adding an alternative COP which only has a line number, but this will break some backward compatibility at the XS level.

tonycoz avatar Jan 26 '21 22:01 tonycoz

I noticed that the regular versions of these functions do:

+      TARGn(left * right, 0);
+      SETs( TARG );

rather than: - SETn( left * right ); to try harder to avoid calling sv_setnv_mg.

richardleach avatar Sep 02 '22 13:09 richardleach

On Tue, 26 Jan 2021 at 23:32, Tony Cook @.***> wrote:

It's waiting on (likely) adding another hints word.

But I think that needs to wait on reducing the cost of COPs which those are embedded into.

Right now a COP is generated for every statement, but the information in each COP typically doesn't change much except for the line number. I've looked at adding an alternative COP which only has a line number, but this will break some backward compatibility at the XS level.

I'd like to hear more about this as it aligns with my interest in improving the quality of our error messages. If can do any legwork here id be happy to hear an appraisal of the problem to get started with. Just mail me personally. You know where. :-)

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq avatar Sep 02 '22 15:09 demerphq

On Tue, 26 Jan 2021 at 23:32, Tony Cook @.***> wrote: It's waiting on (likely) adding another hints word. But I think that needs to wait on reducing the cost of COPs which those are embedded into. Right now a COP is generated for every statement, but the information in each COP typically doesn't change much except for the line number. I've looked at adding an alternative COP which only has a line number, but this will break some backward compatibility at the XS level. I'd like to hear more about this as it aligns with my interest in improving the quality of our error messages. If can do any legwork here id be happy to hear an appraisal of the problem to get started with. Just mail me personally. You know where. :-) Yves

I've stalled on this a bit (error: stack overflow), but I did get a "small COP" large implemented and I don't remember getting any crashes. I still needed to update caller() to understand the new COPs.

There may have been other problems though, I wasn't comfortable with the way I was detecting whether a small COP was possible, eg with code like:

line1;
line2;
if (...) { #line3
   line4;
   line5;
   no strict '...';
   line7;
}
line9;
line10;

lines 1, 4, 7, 9 needed full COPs, and I hadn't gotten to the point of checking that was happening when it should.

Even without adding a small COP we could improve memory usage a great deal by reference counting cop_warnings, and I think cop_file on threads, these are profligate users of memory - each cop has it's own copy.

tonycoz avatar Sep 05 '22 01:09 tonycoz

Even without adding a small COP we could improve memory usage a great deal by reference counting cop_warnings, and I think cop_file on threads, these are profligate users of memory - each cop has it's own copy.

In theory it should be pretty easy to use PL_strtab to do that if they are write-once. I will take a look. Do you have a branch for your small cop work?

demerphq avatar Oct 26 '22 06:10 demerphq

Do you have a branch for your small cop work?

It's very hacky and incomplete (and probably just plain broken), but https://github.com/Perl/perl5/tree/tonyc/less-cop

tonycoz avatar Oct 26 '22 22:10 tonycoz

On Thu, 27 Oct 2022 at 00:26, Tony Cook @.***> wrote:

Do you have a branch for your small cop work?

It's very hacky and incomplete (and probably just plain broken), but https://github.com/Perl/perl5/tree/tonyc/less-cop

Nice, for what its worth ive been looking at replacing cop_file with a HEK. Which would allow the same code to be used to share the pv threads or otherwise.

Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq avatar Oct 27 '22 06:10 demerphq

On Sun, Sep 04, 2022 at 06:33:28PM -0700, Tony Cook wrote:

Even without adding a small COP we could improve memory usage a great deal by reference counting cop_warnings, and I think cop_file on threads, these are profligate users of memory - each cop has it's own copy.

An alternative approach perhaps would be to move most of the COP fields out to a separate ref-counted struct shared by each of the COPs in a sequence, where those fields haven't changed, with each COP reduced to little more than cop_line plus a pointer to the new struct.

-- Never do today what you can put off till tomorrow.

iabyn avatar Nov 07 '22 12:11 iabyn

I have compiled perl from your branch and tested it on two pieces of code that use some float calculations.

First one is Algorithm::QuadTree::PP, which uses some (not much) float math in its circular shape finding routine. No improvement was seen.

The second one is more math-heavy, as it tries to find all border coordinates for a line segment. The heart of the function is implemented as follows:

my $coeff_x = ($position2->[1] - $position1->[1]) / ($position2->[0] - $position1->[0]);

my $checks_for_x = sub ($pos_x) {
	state $partial = $position1->[1] - $position1->[0] * $coeff_x;
	my $pos_y = $partial + $pos_x * $coeff_x;
	return ([$pos_x, $pos_y], [$pos_x - 1, $pos_y]);
};

my $checks_for_y = sub ($pos_y) {
	state $partial = $position1->[0] - $position1->[1] / $coeff_x;
	my $pos_x = $partial + $pos_y / $coeff_x;
	return ([$pos_x, $pos_y], [$pos_x, $pos_y - 1]);
};

my @coords = (
	(map { $checks_for_x->($_) } $position1->[0] + 1 .. $position2->[0]),
	(map { $checks_for_y->($_) } $position1->[1] + 1 .. $position2->[1])
);

Those two anonymous coderefs are then run for each integer coordinate of x and y. They are called about 20 times each and the entire function runs 40 thousand times per second, but I see no improvement on the benchmark if the function starts with use feature 'float'; (I expect this feature works in lexical scope).

I don't think I have anything else at the moment that has more float math in it.

bbrtj avatar Nov 26 '22 14:11 bbrtj

I don't think I have anything else at the moment that has more float math in it.

I suspect sub call overhead is drowning the math costs.

From memory I used the following to benchmark it:

use strict;
my $max_iter = 100;
++$|;
for my $iy (0 .. 1000) {
  my $y = -1 + 0.002 * $iy;
  for my $ix (0 .. 1000) {
    my $x = -1 + 0.002 * $ix;
    my $i = 0;
    my $xo = $x;
    my $yo = $y;
    my $iter = 0;
    while ($xo * $xo + $yo * $yo <= 10 && ++$iter < $max_iter) {
      ($xo, $yo) = ( $xo * $xo - $yo * $yo + $x, 2 * $xo * $yo + $y);
    }
  }
  print ".";
}
print "\n";

which I probably adapted from a C sample in Imager.

tonycoz avatar Nov 27 '22 22:11 tonycoz

I suspect sub call overhead is drowning the math costs.

With all math commented out (but variable declarations etc. left in), it runs about 20% faster, so I assume math takes about 16% of its runtime. When benchmarking your code I see 20-40% improvement, which would mean my code should run about 5-10% faster (taking into account your code also spends some of its runtime assigning variables etc.). You're right, that might not be enough to show on a benchmark.

bbrtj avatar Nov 28 '22 06:11 bbrtj

@tonycoz - i implemented RCPV filename and warnings bits, so we have redcuced the size of cops considerable (all together), so maybe we can reconsider making the hints bits bigger now?

Anyway, this PR is old and in conflict. Maybe we should get it rebased so it can be reconsidered?

demerphq avatar Feb 08 '23 07:02 demerphq

I look at rebasing it, though probably not today.

I'll look at the extra hints word too, though I'm not sure we'll store it for eval (see where doeval_compile() initializes PL_hints).

tonycoz avatar Feb 08 '23 23:02 tonycoz

It looks like you ran a performance test on a mandelbrot set generator in Perl, comparing the performance of using float versus not using float. The test showed that using float improved the performance by about 6 seconds, with the script running in 19.751 seconds with the float option versus 25.752 seconds without it.

EdwardDanchetzNI avatar Mar 16 '23 04:03 EdwardDanchetzNI