algebra icon indicating copy to clipboard operation
algebra copied to clipboard

Make FFTs nlog(d)

Open ValarDragon opened this issue 5 years ago • 5 comments

FFTs in the library are all nlog(n). However they could be nlog(d), which is an efficiency improvement when doing polynomial multiplication.

This can be taken advantage of in two possible ways:

  • Split up the FFT into n/d independent FFTs operating over the same polynomial, but evaluating on distinct cosets. Both of these sub-FFTs are dlog(d). At the end concatenate these two vectors.
  • Skip the first few rounds of butterflying / arithmetic in the current FFT algorithm, and just copy the polynomial coefficients into the correct positions. This is implemented in libiop.

This probably doesn't matter for the current pairing based SNARKs as they are dominated by MSM time.

ValarDragon avatar Jan 14 '20 23:01 ValarDragon

I have been intending to refactor the FFT infrastructure a bit, but I don't have too much experience with this; any help would be greatly appreciated!

Other optimizations are also appreciated!

Pratyush avatar Jan 20 '20 16:01 Pratyush

What is d? What is the progress on this issue?

jon-chuang avatar Apr 03 '20 06:04 jon-chuang

d is the degree of the polynomial being evaluated, and N is the size of the domain

Pratyush avatar Apr 03 '20 06:04 Pratyush

@ValarDragon 30-70 split is not really "dominated by MSM". A 10% improvement in FFT would translate to 3% overall improvement.

What I don't understand is what these cosets/domains are. I understand it when there are multiple polynomials. However, why isn't the domain just d? Don't we always mod out by the same (X^d-1)?

jon-chuang avatar Apr 03 '20 06:04 jon-chuang

The FFT/MSM split depends on your circuit size and protocol choice. Agreed that improving it is important, especially at large circuit sizes.

There are multiple polynomials. RS-IOP/AHP based SNARKs commit to many polynomials, and typically do pointwise multiplications on them, hence requiring the larger domain size.

Because your protocols typically divide by vanishing polynomials, you often can't evaluate your polynomials on the systematic domain, and have to evaluate them on cosets.

ValarDragon avatar Apr 03 '20 09:04 ValarDragon

This would help any time we do a polynomial multiplication, since the domain is necessarily equal to the sum of the degrees of the involved polynomial, and is hence usually at least double their degree.

Pratyush avatar Sep 29 '22 00:09 Pratyush

@andrewmilson this change becomes possible with your work in #487; if you would like to take a stab at implementing the first approach @ValarDragon mentioned, that would be fantastic!

Basically, each evaluation domain of size n can be decomposed into n/d copies of itself, multiplied by a correction factor. Eg, for n = 4, the domain D_4 equals [1, ω_4, ω_4^2, ω_4^3]. For n = 2, the domain D_2 equals [1, ω_2]. Since ω_2^2 = 1 = ω_4^4, we have that ω_2 = ω_4^2. Hence, we can decompose D_4 into D_2 and a coset domain of D_2 with offset ω_4.

Pratyush avatar Sep 29 '22 00:09 Pratyush

Thanks @Pratyush. Yeah keen to take a stab!

andrewmilson avatar Sep 29 '22 01:09 andrewmilson

Awesome! Let me know if you have any questions about the approach; you can ask here or on Discord.

On Sep 28, 2022, at 6:01 PM, Andrew Milson @.***> wrote:

Thanks @Pratyush https://github.com/Pratyush. Yeah keen to take a stab!

— Reply to this email directly, view it on GitHub https://github.com/arkworks-rs/algebra/issues/83#issuecomment-1261619865, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYSJ6VLKV2XS7TV5SLCEETWATS5ZANCNFSM4T4D4YUQ. You are receiving this because you were mentioned.

Pratyush avatar Sep 29 '22 01:09 Pratyush

@Pratyush unless I'm missing something I think it might be easier going straight to the second approach @ValarDragon mentioned.

With the first approach It's easy to obtain the values for the smaller FFTs evaluated on the coset domains but transposing the values so they are in order is not so easy. I haven't spent too much time looking into it but I don't think it's trivial to do without allocating extra memory.

andrewmilson avatar Sep 29 '22 12:09 andrewmilson

You're right that the shuffles to put things in place are somewhat non-trivial, and might cause bad non-locality. It would be good to check what the situation would be like for the other approach.

Re: extra memory, we will need to allocate extra memory either way, since the output (evaluations) is larger than the input (polynomial)

Pratyush avatar Sep 29 '22 20:09 Pratyush

With the first approach I mean additional memory on top of resizing the coefficients to the size of the domain.

The second approach has better locality and no need to allocate more memory than necessary. I put together a quick draft PR #493 just for ideas. I know you mentioned above you are "intending to refactor the FFT infrastructure". I'd be keen to hear what your thoughts are here. I'm more than happy to take the implementation further, just let me know.

andrewmilson avatar Sep 30 '22 02:09 andrewmilson