plot icon indicating copy to clipboard operation
plot copied to clipboard

plot/plotter: NewHist(binPoints) panic when a small negative number is used

Open majiaxin110 opened this issue 1 year ago • 7 comments

What are you trying to do?

I am trying to analyze the distribution of a series of numbers by plotter.NewHist

What did you do?

Here is a minimal example of the code I used:

func TestA(t *testing.T) {
	data := []float64{203.4, -9.223372036854776e+17, 220.4}
	_, err := plotter.NewHist(plotter.Values(data), 10)
	if err != nil {
		fmt.Println(err)
	}
}

What did you expect to happen?

some bin are returned and there is no error or panic

What actually happened?

funcion binPoints panics with following message panic: 203.4, xmin=-9.223372036854776e+17, xmax=220.4, w=9.223372036854779e+16, bin=10, n=10

What version of Go and Gonum/plot are you using?

Go 1.23 gonum.org/v1/plot v0.15.0

Also reproduced in Go 1.20 gonum.org/v1/plot v0.0.0-20190515093506-e2840ee46a6b

Annotation

When debugging, I find the bin is set to 10 wrongly. I doubt it is the computing accuracy problem for float64. Maybe if x == xmax (line 213) should be changed to if bin >=n?

majiaxin110 avatar Jan 31 '25 09:01 majiaxin110

This is working as intended; bins cannot hold negative values. You will need to sanitize the data you provide to the call.

kortschak avatar Jan 31 '25 09:01 kortschak

This is working as intended; bins cannot hold negative values. You will need to sanitize the data you provide to the call.

Sorry for my imprecise description. The fact is I observed a continuous and significant number of panics in production. At the current stage, I suspect bin := int((x - xmin) / w) may cause floating-point calculation error even for positive data. And if x == xmax (line 213) cannot make bin < n in all cases. I will try to collect more logs and provide more details in the future.

On another note, could you please list relevant document or comment for the fact that bins cannot hold negative values? I have read https://pkg.go.dev/gonum.org/v1/plot/plotter?utm_source=godoc#NewHistogram but it seems there is no limitation description for negative values. Thanks!

majiaxin110 avatar Feb 01 '25 05:02 majiaxin110

The documentation for NewHistogram says "Each y value is assumed to be the frequency count for the corresponding x." It is not meaningful for a value to be negative here.

kortschak avatar Feb 01 '25 05:02 kortschak

Sorry, I've taken a closer look and essentially you've hit a limit of float64 precision. I'll see if there is anything we can do about it.

kortschak avatar Feb 01 '25 05:02 kortschak

The issue is that (x-xmin)/w becomes n for your range. The bin calculation depends on all values lower than the max to be some epsilon smaller than the max, but this does not happen here. We could alter the max bin allocation logic at L213-L215, but that would be brittle. The reality here is that you are expecting too much of floats and should try to live within the 1e-16—1e16 range.

kortschak avatar Feb 01 '25 06:02 kortschak

The issue is that (x-xmin)/w becomes n for your range. The bin calculation depends on all values lower than the max to be some epsilon smaller than the max, but this does not happen here. We could alter the max bin allocation logic at L213-L215, but that would be brittle. The reality here is that you are expecting too much of floats and should try to live within the 1e-16—1e16 range.

Thanks for your reply! I fully agree it is due to the precision limits of float64. Staying within 1e-16—1e16 is rational and it works in more cases. I think it might be better if there is any description or tip about it in documentation like NaN, especially if there are no plans to change it.

majiaxin110 avatar Feb 01 '25 07:02 majiaxin110

I think the best we can do is improve the information in the panic. Fully describing the limitations of IEEE 754 is out of scope.

kortschak avatar Feb 01 '25 07:02 kortschak