plot icon indicating copy to clipboard operation
plot copied to clipboard

Replace plotter.XYs with slices

Open btracey opened this issue 8 years ago • 22 comments

In my experience, most data exists as x, y []float64, and not as []struct{X,Y float64}. plotter should replace these types

See discussion https://groups.google.com/d/msg/gonum-dev/FrGpgA-pZQs/FCy8op2JCwAJ

btracey avatar May 31 '16 21:05 btracey

Now if we had tables...

kortschak avatar May 31 '16 23:05 kortschak

wrt tables, as far as gonum/plot is concerned, everything is a float64. so, perhaps we could get away with a float64-based ndim array? but then, is it really the best option to couple gonum/plot with that specific implementation?

should we give @Kunde21's numgo.Array64 a go?

relying on []float64 seems simpler. row-wise data isn't very cache friendly anyways...

sbinet avatar Jun 01 '16 09:06 sbinet

That was not a serious suggestion.

kortschak avatar Jun 01 '16 09:06 kortschak

Why not just add the code:

type XYArray struct {
   X, Y []float64
}

func (xy XYArray) Len() int {
   return len(xy.X)
}

func (xy XYArray) XY(i int) (x,y float64) {
   return xy.X[i], xy.Y[i]
}

Then one could just do

vals := XYArray{X: x, Y: y}

and directly add vals to a plotter.

ctessum avatar Jun 01 '16 15:06 ctessum

https://godoc.org/github.com/btracey/myplot#VecXY :)

Still, I think the slice representation is more natural for gonum/plot. I think it's more common for users of the code, and it's easier to add in more information (say, z data). Frequently the data does need to exist as a slice even internal to plot, see for example the code for NewErrorPoints

btracey avatar Jun 01 '16 15:06 btracey

So is the question, then, whether to replace the XYs type with VecXY or XYArray, or whether to replace the XYer interface with a concrete type in function arguments? i.e.

NewScatter(xys XYer) 

would become

NewScatter(x, y []float64)

To me, the first option seems like a good idea. I see a tradeoff with the second option, where making the change would make things easier for a lot use cases (i.e., where the data to be plotted is already in x and y arrays), but it would make things more difficult for some use cases (e.g., plotting geographic locations, which typically are in the form struct{X, Y float64}).

I would lean toward keeping the current signature for the function arguments but replacing XYs with VecXY or XYVector as that seems to be most in line with the 'do less, enable more' philosophy, and allows the user to make decisions about their data format rather than having the decisions made for them by the plotting package.

ctessum avatar Jun 01 '16 16:06 ctessum

I think your suggestion is reasonable, but just a bit of devil's advocate: Is it actually true that geographic data usually come as struct{X, Y float64})? Doesn't it usually come as struct{Lat, Long, Altitude, {other data} float64})? It doesn't affect your argument, but it does mean that some kind of type munging is always necessary for that kind of data.

btracey avatar Jun 01 '16 16:06 btracey

Yes, although I think that supports my point rather than refutes it. Having an interface function argument means that the user can just implement the XYer interface for whatever complicated data type they have (which, as shown above, takes 6 lines of code), rather have having to mung. So, for example:

type Point {
   Lat,Lon,Altitude, other, variables float64
}

type Points []Point

func (p Points) Len() int {
   return len(p)
}

func (p Points) XY(i int) (x,y float64) {
   return p[i].Lon, p[i].Lat
}

ctessum avatar Jun 01 '16 16:06 ctessum

Yes, I agree it supports your point.

btracey avatar Jun 01 '16 16:06 btracey

Maybe I'm confused, because I don't understand the problem. Doesn't everything take the XYer interface? You can implement that however you like. That's why I made it an interface.

On Wed, Jun 1, 2016, 12:24 Brendan Tracey [email protected] wrote:

Yes, I agree it supports your point.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/gonum/plot/issues/284#issuecomment-223047609, or mute the thread https://github.com/notifications/unsubscribe/AAOXB2z8efGhNToSNjq8M8VeklCO56Fqks5qHbJZgaJpZM4Iq_th .

eaburns avatar Jun 01 '16 16:06 eaburns

I think this ticket should be about providing an easy way to create XYer from X and Y.

This can be done by adding XYArray type as proposed by @ctessum (the name requires some thought though).

Or via function:

func ZipXY(x []float64, y []float64) XYer

Adding some convenience functions like:

NewScatterXY(x []float64, y []float64) (*Scatter, error)

might be good idea too.

kostya-sh avatar Jun 02 '16 07:06 kostya-sh

@kortschak wrt tables and what not: I was browsing the interwebs the other day and stumbled upon @aclements' playground for providing a gg-inspired toolkit for Go: https://godoc.org/github.com/aclements/go-gg

especially: https://godoc.org/github.com/aclements/go-gg/table#Table

It's an interesting read :)

sbinet avatar Jun 07 '16 13:06 sbinet

That is something that looks like what we would be likely to implement as part of a data frame package, but not what I was talking about here.

kortschak avatar Jun 07 '16 23:06 kortschak

I'm really not sure this is worth it. The code for it is just

type zip struct { x, y []float64 }

func (z zip) Len() int                { return len(z.x) }
func (z zip) XY(i int) (x, y float64) { return z.x[i], z.y[i] }

This is less than implementing sort.Interface.

kortschak avatar Jun 08 '16 00:06 kortschak

you could flip it around: "sort" provides implementations for sort.Strings, sort.Float64s and sort.Ints and it's in the stdlib (so the bar for inclusion is higher.)

adding just one exported function (func ZipXY(x, y []float64) XYer) is IMHO hitting the sweet spot.

sbinet avatar Jun 08 '16 07:06 sbinet

I have to agree with @sbinet about the usefulness of a ZipXY(x,y []float64) XYer) function. A portion of the users will have existing slices of structs that can easily implement XYer, but there's a lot of data that starts in the form of one X vector and many Y's. That ends up requiring a zip-style function in each and every project that doesn't have a clean slice of structs which can implement XYer directly. Yes, it's a helper function, but it's not even close to an esoteric use.

Edit: By comparison sort only needs an interface implementations, whereas a set of Y slices requires a new constructor or closure before implementing the XYer interface.

Kunde21 avatar Jun 08 '16 07:06 Kunde21

What does the function give you? If the type is exported you get exactly the same functionality with a smaller API surface, unless there is a proposal that ZipXY panics when len(x) != len(y), which the implementation may also do (though later)* and can be made to explicitly via the Len method.

* This is another issue I'd like to address again - that all data are copied by plot.

kortschak avatar Jun 08 '16 08:06 kortschak

The copying was a conscious decision. If we don't copy the data, then it can change out from under the plotter — safer to copy it. Is copying really causing a problem for anyone? Either the plotter does not copy and stores only a summary of the data (boxplot), or the plotter copies the data and plots all of it (lines, scatter, etc). In the first case there is no copy, so it's not a problem. In the second case, if you are passing so much data that you just can't copy it, then you will certainly will have an issue when you go to plot that many points.

eaburns avatar Jun 08 '16 11:06 eaburns

Yeah; we've had this discussion in the past. All our (gonum) data structures allow things to change underfoot unless they are created by the type/func. Because of this I think it's something worth looking at again. My interest is not size/allocations (which is our drive elsewhere) but flexibility.

kortschak avatar Jun 08 '16 11:06 kortschak

As a new user of plot and plotter, I would definitely not want to see this change. The current XYer approach is elegant and easy to use with a myriad of data types. I am plotting a lot of things, and in all cases I have a slice or set of much more complicated data structures form which I generate an 'X' and 'Y' for plot. Data definitely does not exist as X, Y []float64. I could construct that, but it would be much less user friendly that the current XYer interfaces. As it is with XYer, I am even using cases with different interfaces pointing the same underlying data structures with different XY methods returning different 'X' and 'Y' values for different plots based on the same underlying data structures. If there is anything new, try to keep the XYer approach alongside it.

jasonpfox avatar Jun 19 '16 20:06 jasonpfox

I don't think anyone here is suggesting to remove the XYer interface approach.

kortschak avatar Jun 19 '16 21:06 kortschak

A ZipXY(x,y []float64) XYer) would definitely be appreciated, that's actually something that I was just searching for as my Data tends to be in slices of X /y data (one slice for each)

Some convenience functions wouldn't hurt anyone, especially ones you would need to write again and again (or write an own package, which defeats the point of using a plotting package in the first place)

brunnre8 avatar Sep 30 '16 08:09 brunnre8