datasette-vega icon indicating copy to clipboard operation
datasette-vega copied to clipboard

Not obvious that vega charts are plotted only for rows on the visible page

Open Kabouik opened this issue 4 years ago • 5 comments

I filtered a data set on some criteria and obtain 265 results, split over three pages (100, 100, 65), and reazlized that Vega plots are only applied to the results displayed on the current page, instead of the whole filtered data, e.g., 100 on page 1, 100 on page 2, 65 on page 3. Is there a way to force the graphs to consider all results instead of just the page, considering that pages rarely represent sensible information?

Likewise, while the cluster map does show all results on the first page, if you go to next pages, it will show all remaining results except the previous page(s), e.g., 265 on page 1, 165 on page 2, 65 on page 3.

In both cases, I don't see many situations where one would like to represent the data this way, and it might even lead to interpretation errors when viewing the data. Am I missing some cases where this would be best? Perhaps a clickable option to subset visual representations according visible pages vs. display all search results would do?

[Edit] Oh, I just saw the "Load all" button under the cluster map as well as the setting to alter the max number or results. So I guess this issue only is about the Vega charts.

Kabouik avatar Feb 18 '21 12:02 Kabouik

[Edit] Oh, I just saw the "Load all" button under the cluster map as well as the setting to alter the max number or results. So I guess this issue only is about the Vega charts.

Note that datasette-cluster-map also seems to be limited to 998 displayed points:

ss-2021-02-18_140548

Kabouik avatar Feb 18 '21 13:02 Kabouik

I'm going to transfer this over to the datasette-vega repo and respond there.

simonw avatar Feb 22 '21 21:02 simonw

This is a really interesting problem.

Visualizing all rows in a query can be infeasible due to size - I have Datasette tables with 5 million plus rows in them, and loading all of that data in order to feed it to Vega isn't going to work.

But... you're absolutely right that this is confusing. If you plot a chart against just the first page, don't realize you've done so and make an incorrect decision about your data that's bad!

So I think there may be a short-term and a long-term change to make here.

Short-term: make it much more obvious that only the first page of results is being plotted.

Long-term: provide better support for plotting against SQL queries that aggregate data.

Any time you want to plot 5m rows on a chart you're likely looking to plot some kind of aggregation - number of rows added per day, or average X per month or similar. You can do that with datasette-vega at the moment by constructing a custom SQL query using the SQL sum() and count() and avg() functions and suchlike... but you need to know reasonably advanced SQL in order to do that!

So ideally Datasette would help you out here - it would provide a UI for calculating some of these common aggregations even without SQL knowledge, and that UI would further make it easy for you to plot them on a chart.

simonw avatar Feb 22 '21 21:02 simonw

Sounds good!

For mid-term, an additional field named "Included rows:" where the user could fill whatever range he/she want (at his own risk, or with some limitations on the maximal number; to be set in settings.json) could be useful too, since the length of pages is never relevant to the actual data and can only be changed from settings.json, which is not accessible to all users and is not meant to be changed dynamically depending on individual needs. This would requires rows to be numbered, though, and I think it's not the case currently in custom queries (unless it's a sqlite-utils function I missed).

Kabouik avatar Feb 23 '21 16:02 Kabouik

https://global-power-plants.datasettes.com/global-power-plants/global-power-plants is a good example of the "Load all" button below the map, that shows its progress toward showing all the items on the map.This would be very helpful for datasette-vega charts. as well. Pressing that button doesn't concomitantly show all in a single-page table, which is nice. Right now, the datasette-vega work-around of displaying every single record on one page (to make an accurate chart) makes for slow load times and an overwhelming interface.

mroswell avatar Apr 12 '21 02:04 mroswell