kactl
kactl copied to clipboard
Added a FFTPolynomial file
This PR is mainly to get a conversation started around how we should be representing our polynomials. Looking around at MIT's implementation and Adamant's implementation, there seem to be 2 primary design choices.
First, how do we represent our polynomials. MIT does it with a typedef vector<num> poly. Adamant does it with a struct. I'm leaning towards doing it with a vector<num>, simply due to having slightly less boilerplate. The primary advantage of doing it in a struct is that we can then define a functions as methods of the struct instead of global properties. For example, I think doing poly.inv().resize(K) is more natural than resize(inv(poly), K).
Second, how do we make our polynomials work across the 3 "fields" we support. Namely, doubles, ints with NTT, and ints with FFTMOD. MIT does this by wrapping int modulo operations in a struct. I think we can do this too (probably with ModularArithmetic.h and whatever API changes are necessary). I think with proper enough encapsulation, we should only need to change out one function, namely the *= operator.
I've started implementing the other operations, but I wanted to submit a PR with the basics so we could settle on a structure.
Edit: closes #60.
TBH I don't have a strong opinion on this. It would be nice to join this together with the existing Polynomial.h; there's probably some shared code, and it would make the code foot print of the existing code less, which is nice since it's basically never used.
I was more thinking that after I finish writing this we get rid of the old Polynomial class entirely. I don't think there's much point for it.
There's not much in the old Polynomial class, it's not documented, etc. I made a new file simply so things don't break while I write the new version and that I could submit some smaller PR's instead of a huge 100+line file with like...15 functions.
That's fair. Not that I would mind a large PR; it gives a nice impression of how we imagine things looking in the end. Also, note that PolyRoots.h depends on Polynomial.h and will need some small changes if it's removed.
Ok then, if you insist :^)
I rebased upon the FFTMod PR since this code depends on that.
I made a couple changes to ModularArithmetic.h to make it fit better with this code.
I also wouldn't comment on small style things/trivial golfing for now, I'll fix them once this PR is more settled.
Oh whoops didn't see your comment. Anyways, here's some misc changes that you probably don't want to miss.
I've finished porting all the functions (or their equivalents) from MIT, and added fuzz tests/benchmarking as well. I'd appreciate notes on what cases I should be testing in my benchmarks.
Current benchmarks look like this:
sub tests passed!
2ms elapsed [mine]
1ms elapsed [MIT]
add tests passed!
3ms elapsed [mine]
1ms elapsed [MIT]
div tests passed!
612ms elapsed [mine]
503ms elapsed [MIT]
mod tests passed!
730ms elapsed [mine]
685ms elapsed [MIT]
inv tests passed!
1033ms elapsed [mine]
701ms elapsed [MIT]
derivative tests passed!
2ms elapsed [mine]
2ms elapsed [MIT]
integral tests passed!
21ms elapsed [mine]
8ms elapsed [MIT]
log tests passed!
1293ms elapsed [mine]
1076ms elapsed [MIT]
exp tests passed!
2952ms elapsed [mine]
2642ms elapsed [MIT]
pow tests passed!
0ms elapsed [mine]
0ms elapsed [mit]
eval tests passed!
808ms elapsed [mine]
911ms elapsed [MIT]
interp tests passed!
1080ms elapsed [mine]
1306ms elapsed [MIT]
Notably, inverse, eval, and interp show significant performance differences. I'll be taking a look at those next.
We'll probably also want to split these functions up into files.
EDIT: Updated with revised benchmarks.
The performance differences probably all arise from doing twice as many fft's in inverse.
Do you handle pow(*, 0) correctly?
Do you handle pow(*, 0) correctly?
I believe so.
The performance differences probably all arise from doing twice as many fft's in inverse.
Ok, so I moved some stuff around (namely, in the benchmarks where I was regenerating vectors constantly). I'll update my comment with the new benchmarks. Notably, it seems like the current inverse is only about 40% slower - not 100%.
So the disparity in runtimes for interp/eval are the biggest differences now.
You may want to document various assumptions later. Off the top of my head:
inverseexpectsa[0] != 0logexpectsa[0] == 1expexpectsa[0] == 0pow({0}, 0) == {0}
Document various assumptions
Yeah that's definitely something that should be done. Splitting these into multiple files should luckily give us plenty of space for documentation about requirements and such.
Seems like the main cause of the relatively slow performance is the big performance discrepancy in inverse for small values. For large values the difference is only about 40%, but values of size 10 it's around 200%.
I suspect the right thing to do is to have a separate naive_mul for low values.
Now that we've reached what I consider reasonable feature/performance parity, I'll start formatting things into files/writing some documentation next.
I suspect you can get some reasonable gains on small cases by resuing the rt and rev arrays (store globally). I'm not a huge fan of having naive_mul both on principle and because it's a bunch of lines, but idk.
So the current FFTMod PR has some work on reusing the rt array - I think the rev array needs to be recomputed for each different size.
It seems that for arrays around size 10, there's about a 50% gain in doing so (so I don't think it'll close the gap).
I do agree that it's kind of an ugly hack :'(. Depending on how much we think performance matters for eval/interp, we might not even want to add it. However, a 2x performance gain is definitely something to be considered.
I also think that doing naive_mul isn't too bad. It adds about 6 lines total:
poly &operator*=(poly &a, const poly &b) {
if (sz(a) + sz(b) < 100){
poly res(sz(a) + sz(b) - 1);
rep(i,0,sz(a)) rep(j,0,sz(b))
res[i + j] = (res[i + j] + a[i] * b[j]);
return (a = res);
}
return a = conv(a, b);
}
You can reuse the rev array with some bitshifting, see the MIT implementation (https://github.com/ecnerwala/icpc-book/blob/269f583e2b2a8497caac1f488ce3e7dac2f52ba3/content/numerical/fft.cpp#L67).
It seems to make no impact on performance (perhaps even a negative impact).
I've reorganized the functions into 5 files (with their dependencies by the side (including transitive dependencies). I've tried to balance minimizing number of extraneous dependencies, giving enough space for documentation, and grouping together things by theme.
- PolynomialBase
- PolynomialMod [1]
- PolynomialPow [1]
- PolynomialEval [1], [2]
- PolynomialInterpolate [1], [2], [3], [4] (wouldn't need [3] other than
deriv)
I removed the euclid dependency in ModularArithmetic.h for 2 reasons:
- It makes the total Polynomial code longer :^)
- I suspect that there's not much point in supporting division on coprime values. I can't think of places where one might want to use
ModularArithmeticon a ring that isn't a finite field. If someone really needs to, it's easy enough to change it.
If you want, here's my modinv:
int modinv(int a, int m) { return a == 1 ? 1 : m - 1ll * modinv(m % a, a) * m / a; }
Requires 1 <= a < m and gcd(a,m) == 1
Also, empirically, modinv is somewhere between 1x and 2x faster than modular exponentiation.