plonky2 icon indicating copy to clipboard operation
plonky2 copied to clipboard

Optimize reduce_polys_base

Open Daniel-Aaron-Bloom opened this issue 1 year ago • 2 comments

Updated and improved version of #1034

Only requires a single collect of the powers.

Daniel-Aaron-Bloom avatar Apr 17 '24 19:04 Daniel-Aaron-Bloom

Broadly speaking I'm good with this. We should do some benchmarking though

muursh avatar Apr 18 '24 12:04 muursh

Target branch changed to allow us to benchmark this more easily

muursh avatar Apr 22 '24 16:04 muursh

@Daniel-Aaron-Bloom we benchmarked this earlier and this PR is slower.

ref branch: 3m43.452s test branch: 4m8.345s

muursh avatar Apr 24 '24 14:04 muursh

What a shame. Someone should probably close #1034 then, as this is just an updated version of that.

Daniel-Aaron-Bloom avatar Apr 24 '24 22:04 Daniel-Aaron-Bloom

@Daniel-Aaron-Bloom Sorry for missing out the updated conversation. Outside of the changes from #1034, I noticed you parallelized a loop in this PR, which I suspect may be hindering perfs and explaining part of the overhead (IIRC the initial version of this proposal in #1034 was yielding a couple percents improvement). Is there any particular reason why this extra change was done?

Nashtare avatar Apr 26 '24 15:04 Nashtare

I'm not sure what "extra change" you're referring to. The original change parallelizes a single loop in reduce_polys_base and modifies the call site of that function to support the required parallel iterator input.

My change is the same in that regard. The only difference is my change removes one call to collect that the original change tried to remove but was unsuccessful.

I'd be happy to reintroduce that collect if it's removal is causing the performance dip.

Daniel-Aaron-Bloom avatar Apr 26 '24 16:04 Daniel-Aaron-Bloom

I am talking about your additional par_iter in plonky2/src/fri/oracle.rs which wasn't part of the initial PR.

Nashtare avatar Apr 26 '24 16:04 Nashtare

Ah, so then we are talking about the extra collect, as the mapped Vec has to be able to be parallelizable to be passed into the reduce. Sure, I can remove that and see if collect peforms better.

Daniel-Aaron-Bloom avatar Apr 26 '24 16:04 Daniel-Aaron-Bloom

Oh, but to answer your question directly, the reason was to incorporate this feedback.

Daniel-Aaron-Bloom avatar Apr 26 '24 17:04 Daniel-Aaron-Bloom