PlotlyJS.jl
PlotlyJS.jl copied to clipboard
Slow histogram
The histogram method provided by PlotlyJS seems to be rather slow. I noticed this trying to plot histograms for arrays containing about ten million entries. In order to speed things up, I've written a similar method based on StatsBase and PlotlyJS.bar plots which has shown to be drastically faster in my use case.
Are there any reasons why PlotlyJS should not use StatsBase for the histogram creation?
Should you be interested, I'll gladly provide my approach.
That sounds very cool.
Right now PlotlyJS contains little to no custom logic for preparing plots -- it just makes it easy to collect arguments to pass to the plotly.js javascript library and then view/interact with the results.
That being said, I am definitely not opposed to finding better ways to achieve similar plots. I'd like to think a bit more about how and where to put the "extra" routines (or more efficient ones).
Any suggestions?
Maybe we can have a sub-module PlotlyJS.Contrib
or PlotlyJS.Extra
where things like this can go?
While I do not have any suggestions regarding where to put any additional methods, I'm ready to deliver my code:
using PlotlyJS
using StatsBase
function plotly_hist(x::AbstractVector{T}, nbins::Integer; normalize::Bool = true) where T <: Number
# use StatsBase to create a histogram object
hist = fit(Histogram, x, nbins = nbins, closed = :left)
# obtain bar positions -> center of each interval
bins = similar(x, length(hist.edges[1]) - 1)
edges = hist.edges[1]
for k in eachindex(bins)
bins[k] = (edges[k] + edges[k + 1]) * 0.5
end
if normalize
y = hist.weights ./ length(x) # we need a new array
else
y = hist.weights
end
tr = bar(x = bins, y = y)
return tr
end
With bargap = 0.0 this should look exactly the same as a plot created by histogram.
However, I've noticed a possible issue: Sometimes plotly seems to allow the use of an additional bin. So, if I set nbinsx = 10, the histogram consists of 11 bins. In this case, the results will differ since the histogram creation of StatsBase seems to enforce the limit of 10 bins.
Thanks for posting!
I actually think this could turn into something really cool. My vision is having a PlotlyJS.Contrib
module that contains two main things:
- Julia implementation of new trace types -- like the
StemPlot
from @ssfrr - Julia implementations of existing trace types so that the data processing step can happen efficiently on the Julia side without incurring the overhead of (1) serializing/deserializing gobs of data into/out of JSON to pass to plotly.js and (2) running computations in a potentially slower js environment.
I think your example fits nicely into group (2). The thing we'll need to make sure to do is handle all the arguments that the standard plotly.js trace types handle.
(sorry for the stream of consciousness post here...)
Now that I am thinking/writing more about it I think that it might create a headache for us going forward to maintain compatibility with the built in plotly.js trace types...
One of the things that I ran into with my custom stem plot implementation is that getting a good-looking plot initially wasn't too hard, but managing the state so that you could update the plot dynamically just like the other plots turned out to be tricky. In fact debugging that was what led me to write DeepDiffs.jl because I was tired of wading through big nested dictionaries with small differences. :)
I think that if we have a couple different trace types in a .Contrib
submodule or something then we could figure out an API to make writing them more convenient as we discover things they have in common.
That's good feedback!
Once I find some time I'll try to move the stem trace code and the alternative version of histogram
above into a .Contrib
module and we can start experimenting. I have some code for violin plots lying around somewhere, so that could probably live there too.