mosaic icon indicating copy to clipboard operation
mosaic copied to clipboard

Support histograms

Open maelp opened this issue 2 years ago • 9 comments

It's not clear from the documentation how to support histogram plots, is it possible out-of-the-box, or do we need to do our own binning and then use a bar chart?

I think it's one of the most widespread graph, might be useful to have a few examples in the documentation

maelp avatar Jan 31 '24 21:01 maelp

Here is an existing example (among others) consisting of three linked histograms: https://uwdata.github.io/mosaic/examples/flights-200k.html

And yes, the current model is to perform binning and create a bar chart.

Agreed that the documentation might benefit from a simpler example with "Histogram" in the name!

jheer avatar Jan 31 '24 21:01 jheer

Thanks!

maelp avatar Jan 31 '24 21:01 maelp

I think it would also help to make the distinction between Mosaic and vgplot clearer. The introduction is very good about talking about Mosaic, what it is and how it works, but maybe we can be more explicit about the fact that vgplot is not the only way to use Mosaic.

domoritz avatar Jan 31 '24 21:01 domoritz

I agree

and also how to easily build “escape hatch” to use any methods from vega-lite / observable from Mosaic would be useful

maelp avatar Jan 31 '24 22:01 maelp

Can you elaborate? We already explain extensibility in https://uwdata.github.io/mosaic/why-mosaic/#mosaic-is-extensible and have docs for building clients at https://uwdata.github.io/mosaic/core/#clients.

domoritz avatar Jan 31 '24 23:01 domoritz

Another example that would be really fantastic would be to understand how to bin timestamp data. There are some really good examples on how to plot data by day of week, or month in year. But I just cant get a linear timescale to plot.

vgplot.bin() throws an error "Binder Error: No function matches the given name and argument types '-(TIMESTAMP, BIGINT)' You might need to add explicit type casts." when you try to do that for example with the rectY mark. Perhaps there is something really rudimentary I am missing, in order to get that to work?

I am able to get an areaY to render with timestamp data, but I believe from performance reasons it would be significantly better to have the data binned first.

Unemyr avatar Feb 05 '24 06:02 Unemyr

An update on the histogram with a timescale dimension - I was able to make that work, using the following approach:

  1. Created a customized dateYearMonthDay() function - based off the existing dateMonth() API. Need an input argument to state whether the X1 or X2 parameter shall be calculated (X2 will add a +1 on the year date).

  2. Modified the rectY function to input x1 and x2, referencing the above functions

However, I am not sure this would be considered best practice, should the bin() function be able to seamlessly support this like with other data types (or if it doesn't today, is it the aspiration that it should once implemented)? Feel free to comment on any better approach to achieve this, and I do think an example for this would be very useful as many use timescale elements also for bar charts (business reports et c).

Unemyr avatar Feb 06 '24 02:02 Unemyr

Hi @Unemyr, this is the direction I would recommend. The vgplot bin transform is specifically focused on binning quantitative values, and by design it does not operate on date-time data and related intervals (year, quarter, month, etc). I'd recommend opening a new feature request issue for support for time bin functions that produce the desired intervals (not unlike what Vega-Lite provides). We'd also be happy to review PRs along these lines.

jheer avatar Feb 06 '24 20:02 jheer

OK noted on that. I would be open to contributing PRs for that later. Thanks for the quick reply!

Unemyr avatar Feb 06 '24 22:02 Unemyr

I'll close this for now since mosaic supports histograms.

domoritz avatar May 23 '24 14:05 domoritz