gosling.js icon indicating copy to clipboard operation
gosling.js copied to clipboard

More precise encoding types

Open manzt opened this issue 4 years ago • 2 comments
trafficstars

Right now all the track encoding fields (x, y, color, etc) reuse the Channel definition; however, not all possible Channel definitions are supported for each encoding field.

Some examples of definitions that are completely valid by the schema but have no meaning and aren't supported.

"x": { "type": "quantitative", ... }
"x": { "value": 10 },
"color: { "type": "genomic", "aggregate": ..., ... }

Right now we implicitly rely on convention to ensure that users don't make these mistakes. But we could could make unique types for each encoding field that capture the boundaries of what is allowed. For example, if "x" and "xe" only allow for "type": "genomic" then there is no point in forcing users to specify a type: it can only be genomic. Similar for "y": a user just needs to specify the field name if only one "type" is supported.

This also lends itself to being more expressive per definition, and refining types for the domain use case. e.g.

type X = {
 start: string,
 end?: string,
 axis?: Axis,
}
"x": { "start": "position", "axis": "top" },
// or
"x": { "start": "position", "end": "positionEnd", "axis": "top" },

in gos this would end up looking something like:

gos.Track(...).encode(x=gos.X(start="position", end="positionEnd"))

and we could even make shorthand parsing that turns this into:

gos.Track(...).encode(x="position:positionEnd")

manzt avatar Sep 14 '21 13:09 manzt

We definitely need to use precise types (i.e., X, Y, Size, instead of a union Channel type)!

Your second idea on merging x and xe is interesting and might be a good idea. One issue I found with the current separate channels (i.e., x: {...}, xe: {...}) is that some properties (e.g., axis, domain, range) that should be specified only once can be specified multiple times:

x: { field: "start", axis: "top" },
xe: { field: "end", axis: "none" }, 
// we should pick only one of two axis properties internally

This uncertain and wrong spec can be prevented by using your idea, i.e.,

x: { start: "start", end: "end", axis: "top" } // axis can be defined only once

We use at most four different field names for a single axis—x, xe, x1, x1e—for band connection, so we could allow something like

x: { start: "start", end: "end", start2: "start2", end2: "end" }

Users can still specify only one field as is:

x: { field: "position", ... }

I also like how it could end up in gos (i.e., x="start:end").

I don't see any downside of this approach from the end-users point of view for now and can think about it more while supporting the explicit channel types. Vega-Lite uses separate channels as we currently do (i.e., x, x2), so I also wonder if there was any specific rationale for that.

sehilyi avatar Sep 16 '21 16:09 sehilyi

Your second idea on merging x and xe is interesting and might be a good idea. One issue I found with the current separate channels (i.e., x: {...}, xe: {...}) is that some properties (e.g., axis, domain, range) that should be specified only once can be specified multiple times

I had this example in mind as well!

Vega-Lite uses separate channels as we currently do (i.e., x, x2), so I also wonder if there was any specific rationale for that.

I'm curious if this is related to the multiple data-types that x is allowed to be (quantitative, ordinal, temporal, nominal), and that x2 can only be specified conditionally for quantitative encodings of x. Whereas in our case x can only be genomic for a track, so the two will only be specified (optionally) togeter.

manzt avatar Sep 16 '21 17:09 manzt