redash
redash copied to clipboard
Histogram
What type of PR is this? (check all applicable)
- [x] Feature
Description
New visualization: Histogram by Plotly No tests ready yet.
Related Tickets & Documents
Mobile & Desktop Screenshots/Recordings (if there are UI changes)
Example1: https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493 Example2: https://deploy-preview-4426--redash-preview.netlify.app/queries/224#955




Could you advise what level of separation from chart visualization is necessary?
For example, I have added function prepareHistogramSeries(series, options) to chart/plotly/prepareDefaultData.js. Should I leave this function and the related code in the file here? Should I copy the whole file to histogram folder and leave the chart folder alone? If so, should I erase everything unrelated to histogram once I copy the file to histogram folder? Would it be better to move things to some common library?
@arikfr Why do you think that histogram series cannot be used with other series types? Brief googling found pictures like this one:

@kravets-levko it looks like a line representation of the histogram. Can you think of a real use case? @deecay do you have a use case for mixing them?
@arikfr The curved line over histograms are density curves, which is a fairly common companion of the bar-histogram. The curve is a different representation of the same data set. I can't think of any other chart type mix to be honest.
Could you advise what level of separation from chart visualization is necessary?
The most important part is to have a dedicated interface for the Histogram visualization. The implementation can still share code with charts if it makes sense and/or makes code simpler.
@kravets-levko
Resolved conflict and incorporated your point on utility function.
One thing I could not resolve is the issue around automatic update of the BinSize and BinStart. When an input value for these two inputs gets deleted in the editor, it should redraw the chart with default value but it doesn't. I've played around with debounce but I couldn't get it to work. Any help would be welcomed.
Thank you @deecay! I'll check the issue with inputs, hope there's nothing serious. One more thing I'd like to revive in this discussion - having a new Histogram visualization vs. Histogram chart type (within existing Chart viz) - what's your point on this?
Hi @kravets-levko
On chart types: About a year ago I have tried to implement plotly charts (treemap and radar) using "new Visualization" method. It was difficult. What I realized was that the current structure of (/chart/ and /plotly/) functions was not easy to re-use in a new Visualization. I had to copy decent amount of code from the chart visualization, such as chart resizing, chart refreshing, Data Labels Editor interface, etc, and then tweak it. It made me want to just copy the chart visualization entirely and modify that copied code. But this method obviously decreases maintainability.
Ideally, we could refactor the code so that adding a new plotly-based visualization can be done with relatively small effort, avoiding duplicate code. This PR, the existing Chart viz method, has just +100 lines of code, with almost no duplicate code. So this is my preferred method for now.
If you could show me and example and/or mentor me, I might be able to see things differently.
Eager to hear your thoughts.
This feature is so close to completion, I can feel it. :)
Any ETA on this?
Currently only merging fixes directly related to the V10 beta. We will pipeline this for merge after the V10 release next month.
Any updates on this
would be great if this would make it into a release!
Thanks for the ping! Will aim to merge and include this in V11 this summer.
@susodapop Any news on this PR? This would be a great feature
@susodapop yeah just wanted to bump this too!
Bump. This feature would be very useful.
@deecay , thanks for the PR! We've updated a lot of things now that we're Community-driven so - if you're still interested in getting this merged - would you mind rebasing off master to re-run the CI, as well as updating merge conflicts?
We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that.
Yeah, this looks like it'd be really useful. Hopefully it gets rebased. :smile:
@deecay do you have time to rebase this change? There are quite a few conflicts, but all of them appear to be easy to resolve
Codecov Report
Merging #4426 (7f9287a) into master (650ec90) will increase coverage by
0.03%. Report is 12 commits behind head on master. The diff coverage isn/a.
@@ Coverage Diff @@
## master #4426 +/- ##
==========================================
+ Coverage 61.07% 61.10% +0.03%
==========================================
Files 157 158 +1
Lines 12834 12883 +49
Branches 1747 1753 +6
==========================================
+ Hits 7838 7872 +34
- Misses 4755 4765 +10
- Partials 241 246 +5
@guidopetri @justinclift Rebased.
Awesome! :smile:
@eradman Any interest in reviewing this PR?
Awesome! 😄
@eradman Any interest in reviewing this PR?
Yes, I'll take this change for a test drive
I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.
Now that I spent some time with this I feel that the "binning" function should be more generic. In SQL we can bin values in may ways. For example this bins by powers of 2
WITH series AS (
SELECT
power(2, series-1)::int AS lo,
power(2, series)::int AS hi
FROM generate_series(0, 10) AS series
)
SELECT count(files), lo || '-' || hi AS range
FROM dir_stats
JOIN series ON (files >= series.lo AND files < series.hi)
GROUP BY series, lo, hi
ORDER BY lo
This could be easily adapted to date ranges, range categories, and so on.
A bar chart works fine, except that I do not believe that Redash supports multiple series per graph yet, so perhaps this is the feature we should focus on.
Secondarily, binning options could be added, but I feel that would be somewhat complicated to support multiple scales and data types.
@eradman
I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.
Does the CPU spin for test deployed instance too? (2000 entries) https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493
I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.
Does the CPU spin for test deployed instance too? (2000 entries) https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493
I seem to have no trouble with that demo. I'll try rebuilding my dev environment to see if this is repeatable
Tried cleaning my test env (git clean, docker rm -f ...) and rebuilding dependencies I still hit the same problem with CPU. Is the demo link on netlify.com up to date?
What do the rest of you think?
My thoughts:
- The histogram feature should be a data filter, not a new graph type (Apologies in an advance! I know this contradicts early feedback)
- Binning on the client side won't scale very well, so this transform should happen on the server-side (this is the current limitation of pivot tables)
- The X axis should be able to handle dates and numbers
@eradman
Is the demo link on netlify.com up to date?
I think it is. And I think I've found the reason for the high CPU usage.
This problem happens when the X-axis data is specified as, or looks like, a date. In these cases plotly treats x-axis as type: date. The binning function in histogram treats the value of bin size differently between number axis and dates axis. If the x-axis is numbers, size: 1 means 1.0. But if x-axis is date, then size: 1 means 1ms (!). Therefore, when you have a data with timeseries with range scale of months or years, plotly tries to bin that data with 1ms-bin and gets overloaded.
I will look into what I can do with date binning. (your thought #3)
@eradman
I have added some features.
- Chart will not render if xAxis bin size is "too small". Here, "too small" means that the resulting bin count will be > 10000.
- xAxis bin size now works for date axis, too. Possible values for date axis bin size are:
a. number in ms (86,400,000 will represent a day (1000 * 60 * 60 * 24) ) b. "M1", "M2", etc. where M1 is one month, M2 is two months. See Example2: https://deploy-preview-4426--redash-preview.netlify.app/queries/224#955