gmp icon indicating copy to clipboard operation
gmp copied to clipboard

When appending to arrays, gmp.Int values change for large number arrays

Open eyxue opened this issue 8 years ago • 6 comments

We're using gmp to run on large inputs, and we have arrays with thousands of elements at a time in our code. When we write these arrays to files, we find that the values of some of the elements have changed.

We tested our by code by reading in an input file of large numbers, casting them to gmp.Int, storing them in an array, and writing them to an output file. This works for an array of ~1400 numbers, but if the array gets any larger than that the values in the array change.

Below we print the first element of our array each time a new element gets appended. As the size goes past 1435, the value of the first value changes, and it changes again when the size goes past 1437:

output: 1434 27928410756179523138881080989986005991789933932188228018115755918494634799929873910027825747854697455979387110847058629350347288500624884424807086906801594642406392448568530327829083514883524712985492538589993807582131300292739787897589360129490451849769622648595726892334513355855504557741514903175075958458960111055024641406976670602678887952234068019841236549447926340755063903373950684252611325596552928912480285833159841819645546289989159869982214159616531691638376958340264087137749469269647555189450117026947774590910035240238694361652478067361121651905941283054193444516391789652433009672685099320489013369511

output: 1435 27928410756179523138881080989986005991789933932188228018115755918494634799929873910027825747854697455979387110847058629350347288500624884424807086906801594642406392448568530327829083514883524712985492538589993807582131300292739787897589360129490451849769622648595726892334513355855504557741514903175075958458960111055024641406976670602678887952234068019841236549447926340755063903373950684252611325596552928912480285833159841819645546289989159869982214159616531691638376958340264087137749469269647555189450117026947774590910035240238694361652478067361121651905941283054193444516391789652433009672685099320489013369511

output: 1436 27928410756179523138881080989986005991789933932188228018115755918494634799929873910027825747854697455979387110847058629350347288500624884424807086906801594642406392448568530327829083514883524712985492538589993807582131300292739787897589360129490451849769622648595726892334513355855504557741514903175075958458960111055024641406976670602678887952234068019841236549447926340755063903373950684252611325596552928912480285833159841819645546289989159869982214159616531691638376958340264087137749469269647555189450117026947774590910035240238694361652478067361121651905941283054193424735876961338910144554408906597071520268288

output: 1437 27928410756179523138881080989986005991789933932188228018115755918494634799929873910027825747854697455979387110847058629350347288500624884424807086906801594642406392448568530327829083514883524712985492538589993807582131300292739787897589360129490451849769622648595726892334513355855504557741514903175075958458960111055024641406976670602678887952234068019841236549447926340755063903373950684252611325596552928912480285833159841819645546289989159869982214159616531691638376958340264087137749469269647555189450117026947774590910035240238694361652478067361121651905941283054193424735876961338910144554408906597071520268288

output: 1438 23937065258617293607419221241376749238853786485747685256178538183999977338295726770968837510115411713639449942813542410122781003049158815334638386757489378865115690993509412251696127283961449052609910367016753503276717219374706887218515185370729246298124537836902676996868643830290196728475836135942660102147078687203464316120920925288760650180847521449886391007761057733951927560142110014087955337930094606086003264404490081726797410733774516424213240813188435922154321511108819176444188185518457225955025510879142933128937566544835525378040039436242357463757111369800670593394434569238912921305016540547229516010929

We tried the exact same test with math/big, and this was not an issue. Can you please take a look at this?

eyxue avatar May 13 '17 21:05 eyxue

Can you write a testcase which shows this effect and attach to this issue or on play.golang.org? Or alternatively can I try the code you are using?

ncw avatar May 14 '17 09:05 ncw

I attached a test case below. In short this appends the same number to an array prints out the first element with each loop. The first element will sometimes change multiple times throughout the loops.

We ran this same test case with math/big, and we did not encounter these problems.

Thank you!

On Sun, May 14, 2017 at 5:06 AM, Nick Craig-Wood [email protected] wrote:

Can you write a testcase which shows this effect and attach to this issue or on play.golang.org? Or alternatively can I try the code you are using?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ncw/gmp/issues/4#issuecomment-301299797, or mute the thread https://github.com/notifications/unsubscribe-auth/AJHzGHXjymMBLxK82RgOwCF82tc-MAk6ks5r5sQVgaJpZM4NaMsV .

eyxue avatar May 14 '17 19:05 eyxue

package main

import (
    "fmt"
    "github.com/ncw/gmp"
    // "math/big"
)

func test_gmp_array() {
    index := 0
    output_array := []gmp.Int{}
    for index < 5000 {
        var newnum gmp.Int
        newnum.SetString("27928410756179523138881080989986005991789933932188228018115755918494634799929873910027825747854697455979387110847058629350347288500624884424807086906801594642406392448568530327829083514883524712985492538589993807582131300292739787897589360129490451849769622648595726892334513355855504557741514903175075958458960111055024641406976670602678887952234068019841236549447926340755063903373950684252611325596552928912480285833159841819645546289989159869982214159616531691638376958340264087137749469269647555189450117026947774590910035240238694361652478067361121651905941283054193444516391789652433009672685099320489013369511", 10)
        output_array = append(output_array, newnum)
        fmt.Println("array_size: ", len(output_array))
        fmt.Println(&output_array[0])
        index += 1
    }
}


func main() {
    test_gmp_array();
}

eyxue avatar May 14 '17 19:05 eyxue

Copying my answer to this issue on the golang-nuts mailing list:

There's a finalizer being set on *gmp.Int when SetString() is called. Since newnum is a gmp.Int and not a *gmp.Int and newnum is copied into output_array, the pointer with the finalizer set on it is no longer referenced and the finalizer is eligible to be run when the next GC occurs.

If you put a manual call to runtime.GC() in the loop the issue presents itself much more quickly.

The issue does not occur if you avoid copying newnum by storing pointers in the slice. https://play.golang.org/p/l7kXpxArc1

I'd add a note to your docs that it's not safe to copy your Int and Rat types.

vcabbage avatar May 14 '17 20:05 vcabbage

@vcabbage thanks for working out this issue.

I'd add a note to your docs that it's not safe to copy your Int and Rat types.

That breaks the compatibility between the Int and Rat types and the math/big versions which is unfortunate...

So I'd like them to be copiable if possible...

That could be done with another layer of indirection so the Int type could become something like

// mpz holds the gmp type.  This is wrapped in a go type so we can
// attach initializers to it.
type mpz struct {
	i C.mpz_t
}

// An Int represents a signed multi-precision integer.
// The zero value for an Int represents the value 0.
type Int struct {
	*mpz
}

// Finalizer - release the memory allocated to the mpz
func intFinalize(z *mpz) {
	runtime.SetFinalizer(z, nil)
	C.mpz_clear(&z.i[0])
}

// initialized returns whether z has had doinit() called on it
func (z *Int) initialized() bool {
	return z.mpz != nil
}

// Int promises that the zero value is a 0, but in gmp
// the zero value is a crash.  To bridge the gap, the
// init bool says whether this is a valid gmp value.
// doinit initializes z.i if it needs it.
func (z *Int) doinit() {
	if z.initialized() {
		return
	}
	z.mpz = new(mpz)
	C.mpz_init(&z.i[0])
	runtime.SetFinalizer(z.mpz, intFinalize)
}

I've pushed this to a branch (just for Int) if anyone wants to have a go with it.

What do you all think - sensible approach?

This fixes the test @eyxue provided.

ncw avatar May 17 '17 09:05 ncw

Seems like a good solution to me. 👍

vcabbage avatar May 17 '17 17:05 vcabbage