gnark-crypto icon indicating copy to clipboard operation
gnark-crypto copied to clipboard

Consider ditching Montgomery form for Small Fields

Open Tabaie opened this issue 5 months ago • 1 comments

func (z *Element) mulNaive(a, b *Element) *Element {
	z[0] = uint32((uint64(a[0]) * uint64(b[0])) % q)
	return z
}

func BenchmarkMulNaive(b *testing.B) {
	var a, z Element
	a.MustSetRandom()
	z.MustSetRandom()
	b.ResetTimer()

	for range b.N {
		z.mulNaive(&a, &z)
	}
}

func BenchmarkMul(b *testing.B) {
	var a, z Element
	a.MustSetRandom()
	z.MustSetRandom()
	b.ResetTimer()

	for range b.N {
		z.Mul(&a, &z)
	}
}

On an M1 (ARM64) MacBook Pro I get the following:

BenchmarkMulNaive-8                                     203830368                5.818 ns/op
BenchmarkMul-8                                          195042295                6.128 ns/op

A 5% speedup in favor of the "naive" option, which apart from performance concerns is attractive due to its simplicity.

A major caveat here is that while AVX512 has shift instructions ("division by R" in Montgomery reduction), it doesn't seem to support generic integer division, which will have to be hacked around using floating point division. (i.e. divide as floats, truncate the quotient, multiply ad subtract). So it is possible that this change would speed up scalar ops but hurt vector ops. Barrett reduction on the other hand, which works on the naive representation, can be vectorized well.

Tabaie avatar Aug 06 '25 17:08 Tabaie

cc @DavePearce

Tabaie avatar Aug 06 '25 18:08 Tabaie