vega-lite icon indicating copy to clipboard operation
vega-lite copied to clipboard

APIs to interact with Selection

Open kanitw opened this issue 9 years ago • 48 comments

kanitw avatar Jan 16 '17 00:01 kanitw

This question in the vega-js group is very relevant to this issue.

kanitw avatar Apr 30 '17 21:04 kanitw

@arvind Can you post an example of how one might access a selection? I'm trying to build an application where you can crossfilter between different visualization. Since I want to re-query the data, I cannot use a single spec for the different charts.

domoritz avatar Jul 07 '17 18:07 domoritz

Sure. The selection states are stored in datasets named selectionName_store (e.g., if you had a selection named brush, the dataset would be named brush_store). Accessing the dataset via the view api (view.data('brush_store')) gives you the constituent queries for each of the selection instances (i.e., no resolution will be performed). In the case of point (single/multi) selections, this will be an array of values; for interval selections it will be the data extents. You can similarly set the selection state via the view API provided the tuples you insert follow the same structure. Note: for interval selections, inserting new selection instances via the view API may not always correctly update the brush mark state.

arvind avatar Jul 07 '17 19:07 arvind

Note: for interval selections, inserting new selection instances via the view API may not always correctly update the brush mark state.

Why is that? Is there a way to correct the brush?

domoritz avatar Jul 07 '17 20:07 domoritz

The brush mark is currently driven by signals within each unit. It's difficult to update these signals based on updates to the backing dataset (we need to identify the matching tuple, and extract information from it). @jheer and I decided that this would be a limitation with 2.0 that we would address in subsequent releases once we better understood how users wanted to update selections via the API.

arvind avatar Jul 07 '17 20:07 arvind

/cc @djahandarie

mstone avatar Oct 13 '17 16:10 mstone

To make forward progress on this, I would like to decouple an API for writing to selections (which involves hairier design/implementation issues) from reading selections (which should hopefully be more straightforward). Here're some ideas sketching out the latter.

New API Methods

  • vl.selection(view, selectionName) -- returns an array of tuples that define a selection's predicate, respecting any resolution rules. For example, [{Origin: 'Japan', Year: 1981}, {Origin: 'USA', Year: 1982}] for a multi selection or [{Horsepower: [40, 150], Miles_per_Gallon: [40, 15]}] for an interval selection.

  • vl.addSelectionListener(view, selectionName, handler) and view.removeSelectionListener(view, selectionName, handler).

Notes

  • Selection tuples are stored in datasets named selectionName_store, and the logic for evaluating these tuples as a predicate is encapsulated within Vega expression functions.

  • The vlPointDomain and vlIntervalDomain functions resolve the tuples, producing a list of selected values for a specific field. Reading a selection should invoke a more general version of these functions (e.g., vlPointValues and vlIntervalValues) that resolves all selected fields in a single pass, rather than one pass per field.

  • The simplest solution would be for each selection to add a top-level signal that calls the appropriate Values method. However, if selections are never read from externally, this incurs a performance penalty of re-evaluating selection tuples on every interaction event.

  • Alternatively, the vl.selection method could invoke these Vega expression functions directly. This strategy would keep selection logic encapsulated within expression functions, and would not incur the cost of needlessly resolving selection tuples on every interaction event. To do so, however, we need the following:

    • Parse a Vega expression outside a specification to generate a Function that can be invoked. The vl.selection could memoize this step by storing the generated Function on the view (e.g., view._vlPointValuesAST).

    • Selection expression functions register tuplesRef and indataRef on their scopes. Automatically registering these refs when an expression function is parsed externally seems problematic. An alternate solution might instead make it possible to explicitly declare needed refs as part of the specification?

    • view.addDataListener and view.removeDataListener functions that the Vega-Lite selection listener functions would map to.

I lean towards exposing selections as signals for both being the simplest, most idiomatic solution that does not require modifications to Vega internals beyond the expression functions. Moreover, these new top-level signals could also offer a cleaner entry point for a future "selection write" API (e.g., writing to these signals would update the backing dataset and any downstream signals within views).

/cc @jheer, @kanitw, @domoritz

arvind avatar Apr 05 '18 16:04 arvind

This is great. I think having top level signals makes sense especially if we can use them to write. I wonder whether we even need the helper functions in that case or whether the Vega view API is sufficient.

domoritz avatar Apr 09 '18 22:04 domoritz

Yeah, I went back and forth on adding Vega-Lite helper functions. I lean towards adding them (rather than relying on the Vega view API alone) to give users a forward-compatible way to access selections agnostic to the Vega we generate. Thus, we would be free to change the underlying mechanisms of how selections could work in the future.

An interesting question is whether we are protected from all of this with semantic versioning. If we point users to Vega view APIs to access selections, then we're implicitly extending the semantic versioning contract to the Vega we generate. This has advantages (e.g., Lyra would certainly appreciate being able to rely on this definition of semantic versioning, as it analyzes the generated Vega). But, I'm not sure how feasible this would actually, be or how we would define what major/minor changes in Vega-Lite -> Vega generation would be...

arvind avatar Apr 09 '18 22:04 arvind

I don't know how forward compatible we need to be and I don't see us changing how selections are implemented anytime soon. Thus, I lean towards not providing helper functions.

Every Vega-Lite version already has a minimum Vega version it depends on. We can make a promise about the specific signals while still being flexible about how we generate other parts of the spec.

domoritz avatar Apr 09 '18 22:04 domoritz

I think we should provide helper functions because it's not a realistic expectation that Vega-Lite users should know how we name the underlying data sources and signals.

Plus, there is no "signal" concept in Vega-Lite, asking users to use signal APIs (which is a lower level abstraction) is a bit weird.

kanitw avatar Apr 09 '18 23:04 kanitw

Let's see how we name the signals. I'm expecting the signal names to directly correspond to the selection names.

domoritz avatar Apr 09 '18 23:04 domoritz

Sorry if this isn't the right place for this question, but does this mean if I'm using Vega-Embed and I have a selection in my spec like this:

selection: {
    brush: {
        encodings: ['x'],
        type: 'interval'
    }
}

Then the correct way to access that selection is like this?

view.addDataListener('brush_store', function (name, value) {
    console.log(value[0].intervals[0].extent)
})

This is working for me, but seems a bit verbose? Is there a simpler API for this now?

simon-lang avatar Oct 18 '18 00:10 simon-lang

@simon-lang We are working on improving this, which we will release in Vega-Lite 3. See https://github.com/vega/vega-lite/pull/4068 for details.

domoritz avatar Oct 18 '18 07:10 domoritz

@arvind we can close this issue once we have documented the new API, right?

domoritz avatar Oct 18 '18 07:10 domoritz

Thanks for checking in @simon-lang. As @domoritz mentioned, we should have good news to share on this front soon :)

@domoritz, we're tracking selection API documentation in #2790 so this issue should be safe to close.

arvind avatar Oct 18 '18 13:10 arvind

Thanks @domoritz & @arvind . I just discovered Vega recently and I'm absolutely loving it. Keep up the great work!

simon-lang avatar Oct 19 '18 04:10 simon-lang

Just got a chance to follow up on this.

The refactor in #4068 is very helpful for interacting with selection data.

However, whether we (1) have a thin wrapper around the signal APIs for the selection APIs or (2) ask users to directly use the signal APIs is still an open question?

I still slightly prefer (1) as it is weird if we only provide abstraction only at the syntax level, but not at the API level. That said, I'm happy to hear the reasoning for (2) too.

kanitw avatar Dec 21 '18 01:12 kanitw

Another question. The signal refactor definitely makes reading the selection quite straightforward.

However, I remember @arvind mentioned that setting the selection is still tricky. If so, should we start a new issue to discuss for the setting part? The trickiest part is probably how to design with faceted data -- but I start wondering that a combination of group name and key should be sufficient to identify different subplots in the scenegraph?

kanitw avatar Dec 21 '18 03:12 kanitw

Looking closer at selection codebase, currently the unit in the selection data store is generated using the unitName() method.

For faceted plots, which can contain multiple units of the same name, we currently append the unit name with values from row- and column-fields (delimited by _). For example, for a plot that facets by cylinder, a unit name is child_6.

However, this unitName() method has two issues:

  1. It only includes the row/column values of the nearest ancestor that is a facet. For nested facet, we can still generate redundant unit names (as the upper facet's value won't be considered). -- This is currently not very critical as we hide our support for nested facet, as we still have to deal with other nested facet issue such as https://github.com/vega/vega-lite/issues/2761 (and thus, we still hide nested facet from the official schema).

  2. More importantly, To set the selection by writing the datasets via the view data APIs would be tricky for users. Basically, it is a bit tricky to provide the right unit name as we have quite an arbitrary format (row value first, and then the column value). Once we support nested facet, this will get even more hairy.

To resolve 2), we could consider splitting key aspects from unit. For example, from {unit: "child_6"} where 6 is the Cylinders value could become {unit: 'child', key: {Cylinders: 6}}. (Or some similar design.) However, we use the unit as the lookup key for selection (e.g,. with resolve: "global"). So splitting this would make the comparison inefficient. Thus, this won't really work.

Instead, we may need to explain this complicated rule in the docs for interacting with selection. However, I think this unit name rules is too complicated. Thus, it is better to provide a selection set/write API that converts the input with a more user-friendly format (e.g.,{unit/name: 'child', key: {Cylinders: 6}}) and convert this into the internal unit key. If we use this as input, then I think we should use the same format for the output of read API.

kanitw avatar Dec 21 '18 05:12 kanitw

Note that I think the format {Cylinders: 6} should be reasonable as nested / crossed facet shouldn't use the same field in multiple row/column channels. Even if users do, many of the cells will be empty and this still won't generate cells with redundant unit names.

kanitw avatar Dec 21 '18 05:12 kanitw

Oh, but the other tricky part is repeat like @arvind originally suspected. Basically, we populate subplot's unit name by appending the repeated variable name to the original unit name. So whatever structure we are providing to support facet above, should support repeat use case as well.

To kick start the conversation, I think we should split repeat from facet as repeat deals with field name while facet deals with field value. Perhaps we could do: {unit: 'child', repeat: {column: "field_name"}, facet: {Cylinders: 6}}. For nested repeat, the key becomes the "as" of the repeater as proposed in https://github.com/vega/vega-lite/issues/2767.

(Btw, having this long thread makes me feel like we should adopt RFCs repo like React. -- It might not be that much more work, but will provide a nice way to iteratively improve proposal.)

kanitw avatar Dec 21 '18 05:12 kanitw

Hi, Is this feature available in latest release? If not when can we expect it. What I am trying to do is setting selection based on some external event. Please let me know if there is any way to do so.

dileepyelleti avatar May 29 '19 05:05 dileepyelleti

Currently, you can read but not write selections.

domoritz avatar May 29 '19 21:05 domoritz

Hello, What is the current status of this issue? I tested with @simon-lang's addDataListener approach and that worked. But I wonder if there is a more elegant way than filtering my dataset based on the selection coordinates.

bearzx avatar Jun 11 '19 23:06 bearzx

@bearzx See my comment from two weeks ago. We hope to have a write API for Vega-Lite 4 but no promises.

domoritz avatar Jun 12 '19 03:06 domoritz

I meant what is the API for reading? Do I still need to go through the view API to set a DataListener to read the selection information like @simon-lang mentioned?

bearzx avatar Jun 12 '19 04:06 bearzx

Ok I guess I get it now after re-reading the documents of View APIs. I suppose I still need to know the "internal" variable naming convention: for a one time read, the best way is view.data("brush_store") assuming I have a selection named brush.

bearzx avatar Jun 12 '19 19:06 bearzx

Yep, for now you go through the view API. @arvind implemented that the names are consistent with the names of the selections.

domoritz avatar Jun 12 '19 21:06 domoritz

FWIW, this discussion/confusion makes it clear that we should try to include this in 4.0 if we have time to do so.

kanitw avatar Jun 12 '19 21:06 kanitw