redash icon indicating copy to clipboard operation
redash copied to clipboard

Histogram

Open deecay opened this issue 5 years ago • 31 comments

What type of PR is this? (check all applicable)

  • [x] Feature

Description

New visualization: Histogram by Plotly No tests ready yet.

Related Tickets & Documents

Mobile & Desktop Screenshots/Recordings (if there are UI changes)

Example1: https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493 Example2: https://deploy-preview-4426--redash-preview.netlify.app/queries/224#955

image

image

image

image

deecay avatar Dec 08 '19 18:12 deecay

Could you advise what level of separation from chart visualization is necessary?

For example, I have added function prepareHistogramSeries(series, options) to chart/plotly/prepareDefaultData.js. Should I leave this function and the related code in the file here? Should I copy the whole file to histogram folder and leave the chart folder alone? If so, should I erase everything unrelated to histogram once I copy the file to histogram folder? Would it be better to move things to some common library?

deecay avatar Dec 13 '19 18:12 deecay

@arikfr Why do you think that histogram series cannot be used with other series types? Brief googling found pictures like this one:

image

kravets-levko avatar Dec 13 '19 18:12 kravets-levko

@kravets-levko it looks like a line representation of the histogram. Can you think of a real use case? @deecay do you have a use case for mixing them?

arikfr avatar Jan 21 '20 09:01 arikfr

@arikfr The curved line over histograms are density curves, which is a fairly common companion of the bar-histogram. The curve is a different representation of the same data set. I can't think of any other chart type mix to be honest.

deecay avatar Jan 21 '20 14:01 deecay

Could you advise what level of separation from chart visualization is necessary?

The most important part is to have a dedicated interface for the Histogram visualization. The implementation can still share code with charts if it makes sense and/or makes code simpler.

arikfr avatar Jan 22 '20 09:01 arikfr

@kravets-levko

Resolved conflict and incorporated your point on utility function.

One thing I could not resolve is the issue around automatic update of the BinSize and BinStart. When an input value for these two inputs gets deleted in the editor, it should redraw the chart with default value but it doesn't. I've played around with debounce but I couldn't get it to work. Any help would be welcomed.

deecay avatar Oct 19 '20 10:10 deecay

Thank you @deecay! I'll check the issue with inputs, hope there's nothing serious. One more thing I'd like to revive in this discussion - having a new Histogram visualization vs. Histogram chart type (within existing Chart viz) - what's your point on this?

kravets-levko avatar Oct 19 '20 10:10 kravets-levko

Hi @kravets-levko

On chart types: About a year ago I have tried to implement plotly charts (treemap and radar) using "new Visualization" method. It was difficult. What I realized was that the current structure of (/chart/ and /plotly/) functions was not easy to re-use in a new Visualization. I had to copy decent amount of code from the chart visualization, such as chart resizing, chart refreshing, Data Labels Editor interface, etc, and then tweak it. It made me want to just copy the chart visualization entirely and modify that copied code. But this method obviously decreases maintainability.

Ideally, we could refactor the code so that adding a new plotly-based visualization can be done with relatively small effort, avoiding duplicate code. This PR, the existing Chart viz method, has just +100 lines of code, with almost no duplicate code. So this is my preferred method for now.

If you could show me and example and/or mentor me, I might be able to see things differently.

Eager to hear your thoughts.

deecay avatar Oct 19 '20 11:10 deecay

This feature is so close to completion, I can feel it. :)

borsi avatar Mar 22 '21 15:03 borsi

Any ETA on this?

mmcdermott avatar Aug 22 '21 20:08 mmcdermott

Currently only merging fixes directly related to the V10 beta. We will pipeline this for merge after the V10 release next month.

susodapop avatar Aug 23 '21 15:08 susodapop

Any updates on this

gaganc-on avatar Feb 17 '22 19:02 gaganc-on

would be great if this would make it into a release!

thrau avatar Jul 24 '22 22:07 thrau

Thanks for the ping! Will aim to merge and include this in V11 this summer.

susodapop avatar Jul 24 '22 23:07 susodapop

@susodapop Any news on this PR? This would be a great feature

beknazar avatar Oct 24 '22 05:10 beknazar

@susodapop yeah just wanted to bump this too!

nathanlmeyers avatar Nov 02 '22 07:11 nathanlmeyers

Bump. This feature would be very useful.

fibr avatar Mar 21 '23 20:03 fibr

@deecay , thanks for the PR! We've updated a lot of things now that we're Community-driven so - if you're still interested in getting this merged - would you mind rebasing off master to re-run the CI, as well as updating merge conflicts?

We're trying to clean up our PR todo list, so if you're not interested, that's fine - we'll close the PR in about a week if we don't hear back. If you're interested in reopening the PR afterwards, we would also very much welcome that.

guidopetri avatar Jul 15 '23 19:07 guidopetri

Yeah, this looks like it'd be really useful. Hopefully it gets rebased. :smile:

justinclift avatar Jul 16 '23 02:07 justinclift

@deecay do you have time to rebase this change? There are quite a few conflicts, but all of them appear to be easy to resolve

eradman avatar Aug 29 '23 17:08 eradman

Codecov Report

Merging #4426 (7f9287a) into master (650ec90) will increase coverage by 0.03%. Report is 12 commits behind head on master. The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4426      +/-   ##
==========================================
+ Coverage   61.07%   61.10%   +0.03%     
==========================================
  Files         157      158       +1     
  Lines       12834    12883      +49     
  Branches     1747     1753       +6     
==========================================
+ Hits         7838     7872      +34     
- Misses       4755     4765      +10     
- Partials      241      246       +5     

see 4 files with indirect coverage changes

codecov[bot] avatar Sep 18 '23 01:09 codecov[bot]

@guidopetri @justinclift Rebased.

deecay avatar Sep 18 '23 05:09 deecay

Awesome! :smile:

@eradman Any interest in reviewing this PR?

justinclift avatar Sep 18 '23 05:09 justinclift

Awesome! 😄

@eradman Any interest in reviewing this PR?

Yes, I'll take this change for a test drive

eradman avatar Sep 18 '23 12:09 eradman

I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.

Now that I spent some time with this I feel that the "binning" function should be more generic. In SQL we can bin values in may ways. For example this bins by powers of 2

WITH series AS (
    SELECT
      power(2, series-1)::int AS lo,
      power(2, series)::int AS hi
    FROM generate_series(0, 10) AS series
)
SELECT count(files), lo || '-' || hi AS range
FROM dir_stats
JOIN series ON (files >= series.lo AND files < series.hi)
GROUP BY series, lo, hi
ORDER BY lo

This could be easily adapted to date ranges, range categories, and so on.

A bar chart works fine, except that I do not believe that Redash supports multiple series per graph yet, so perhaps this is the feature we should focus on.

Secondarily, binning options could be added, but I feel that would be somewhat complicated to support multiple scales and data types.

eradman avatar Sep 18 '23 16:09 eradman

@eradman

I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.

Does the CPU spin for test deployed instance too? (2000 entries) https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493

deecay avatar Sep 18 '23 18:09 deecay

I ran into some trouble with larger data sets on Chrome and Firefox. With as few as 500 entries both browsers would spin at 90% CPU for several minutes.

Does the CPU spin for test deployed instance too? (2000 entries) https://deploy-preview-4426--redash-preview.netlify.com/queries/270#493

I seem to have no trouble with that demo. I'll try rebuilding my dev environment to see if this is repeatable

eradman avatar Sep 18 '23 19:09 eradman

Tried cleaning my test env (git clean, docker rm -f ...) and rebuilding dependencies I still hit the same problem with CPU. Is the demo link on netlify.com up to date?

What do the rest of you think?

My thoughts:

  1. The histogram feature should be a data filter, not a new graph type (Apologies in an advance! I know this contradicts early feedback)
  2. Binning on the client side won't scale very well, so this transform should happen on the server-side (this is the current limitation of pivot tables)
  3. The X axis should be able to handle dates and numbers

eradman avatar Sep 19 '23 13:09 eradman

@eradman

Is the demo link on netlify.com up to date?

I think it is. And I think I've found the reason for the high CPU usage.

This problem happens when the X-axis data is specified as, or looks like, a date. In these cases plotly treats x-axis as type: date. The binning function in histogram treats the value of bin size differently between number axis and dates axis. If the x-axis is numbers, size: 1 means 1.0. But if x-axis is date, then size: 1 means 1ms (!). Therefore, when you have a data with timeseries with range scale of months or years, plotly tries to bin that data with 1ms-bin and gets overloaded.

I will look into what I can do with date binning. (your thought #3)

deecay avatar Sep 20 '23 10:09 deecay

@eradman

I have added some features.

  1. Chart will not render if xAxis bin size is "too small". Here, "too small" means that the resulting bin count will be > 10000.
  2. xAxis bin size now works for date axis, too. Possible values for date axis bin size are:
    a. number in ms (86,400,000 will represent a day (1000 * 60 * 60 * 24) ) b. "M1", "M2", etc. where M1 is one month, M2 is two months. See Example2: https://deploy-preview-4426--redash-preview.netlify.app/queries/224#955

deecay avatar Oct 06 '23 12:10 deecay