PlotlyJS.jl icon indicating copy to clipboard operation
PlotlyJS.jl copied to clipboard

Slow histogram

Open jkrimmer opened this issue 6 years ago • 5 comments

The histogram method provided by PlotlyJS seems to be rather slow. I noticed this trying to plot histograms for arrays containing about ten million entries. In order to speed things up, I've written a similar method based on StatsBase and PlotlyJS.bar plots which has shown to be drastically faster in my use case.

Are there any reasons why PlotlyJS should not use StatsBase for the histogram creation?

Should you be interested, I'll gladly provide my approach.

jkrimmer avatar Sep 12 '17 12:09 jkrimmer

That sounds very cool.

Right now PlotlyJS contains little to no custom logic for preparing plots -- it just makes it easy to collect arguments to pass to the plotly.js javascript library and then view/interact with the results.

That being said, I am definitely not opposed to finding better ways to achieve similar plots. I'd like to think a bit more about how and where to put the "extra" routines (or more efficient ones).

Any suggestions?

Maybe we can have a sub-module PlotlyJS.Contrib or PlotlyJS.Extra where things like this can go?

sglyon avatar Sep 12 '17 12:09 sglyon

While I do not have any suggestions regarding where to put any additional methods, I'm ready to deliver my code:

using PlotlyJS
using StatsBase

function plotly_hist(x::AbstractVector{T}, nbins::Integer; normalize::Bool = true) where T <: Number

    # use StatsBase to create a histogram object
    hist = fit(Histogram, x, nbins = nbins, closed = :left)

    # obtain bar positions -> center of each interval
    bins = similar(x, length(hist.edges[1]) - 1)
    edges = hist.edges[1]

    for k in eachindex(bins)
        bins[k] = (edges[k] + edges[k + 1]) * 0.5
    end


    if normalize
        y = hist.weights ./ length(x) # we need a new array
    else
        y = hist.weights
    end

    tr = bar(x = bins, y = y)

    return tr
end

With bargap = 0.0 this should look exactly the same as a plot created by histogram.

However, I've noticed a possible issue: Sometimes plotly seems to allow the use of an additional bin. So, if I set nbinsx = 10, the histogram consists of 11 bins. In this case, the results will differ since the histogram creation of StatsBase seems to enforce the limit of 10 bins.

jkrimmer avatar Sep 12 '17 17:09 jkrimmer

Thanks for posting!

I actually think this could turn into something really cool. My vision is having a PlotlyJS.Contrib module that contains two main things:

  1. Julia implementation of new trace types -- like the StemPlot from @ssfrr
  2. Julia implementations of existing trace types so that the data processing step can happen efficiently on the Julia side without incurring the overhead of (1) serializing/deserializing gobs of data into/out of JSON to pass to plotly.js and (2) running computations in a potentially slower js environment.

I think your example fits nicely into group (2). The thing we'll need to make sure to do is handle all the arguments that the standard plotly.js trace types handle.

(sorry for the stream of consciousness post here...)

Now that I am thinking/writing more about it I think that it might create a headache for us going forward to maintain compatibility with the built in plotly.js trace types...

sglyon avatar Sep 15 '17 17:09 sglyon

One of the things that I ran into with my custom stem plot implementation is that getting a good-looking plot initially wasn't too hard, but managing the state so that you could update the plot dynamically just like the other plots turned out to be tricky. In fact debugging that was what led me to write DeepDiffs.jl because I was tired of wading through big nested dictionaries with small differences. :)

I think that if we have a couple different trace types in a .Contrib submodule or something then we could figure out an API to make writing them more convenient as we discover things they have in common.

ssfrr avatar Sep 15 '17 19:09 ssfrr

That's good feedback!

Once I find some time I'll try to move the stem trace code and the alternative version of histogram above into a .Contrib module and we can start experimenting. I have some code for violin plots lying around somewhere, so that could probably live there too.

sglyon avatar Sep 18 '17 12:09 sglyon