Gadfly.jl icon indicating copy to clipboard operation
Gadfly.jl copied to clipboard

Histogram issues: bar with height 1 is drawn in wrong place with stacked color, bincount acts confusingly

Open ilyagr opened this issue 5 years ago • 7 comments

I'm running Julia 1.5 and Gadfly v1.3.0.

I'm not completely sure this is a bug, but I think the bar chart the should be distinct bars at x=1 and x=2 in the following:

 Gadfly.plot(DataFrame(c = [true, true, false], x=[1, 2, 1]), x="x", color="c", 
         Geom.histogram)

bad

For comparison, without color, it looks better:

    Gadfly.plot(DataFrame(c = [true, true, false], x=[1, 2, 1]), x="x", Geom.histogram)

ok

One possible issue is that the bars are too wide. I tried to adjust for that by changing 2 to 5 and increasing bincount, but that also had an unexpected effect (might also be a bug). Instead of the bars getting narrower, the x axis got uselessly extended to the right.

     Gadfly.plot(DataFrame(c = [true, true, false], x=[1, 5, 1]), x="x",  color="c", 
          Geom.histogram(bincount=10)) 

xaxis

The best workaround I found so far is to abandon histograms and stacking, and use Stat.histogram with point geometry.

ilyagr avatar Aug 11 '20 00:08 ilyagr

Try

using Compose # for cx, cy units
df = DataFrame(c=[true, true, false], x=[1, 2, 1])
plot(df, x=:x, color=:c, Geom.histogram, Scale.x_discrete, Theme(bar_spacing=0.5cx))

bar_spacing can be in relative (e.g. 0.1w), absolute (e.g. 5mm), or plot context units (e.g. 0.5cx).

Mattriks avatar Aug 11 '20 00:08 Mattriks

Thanks for the suggestion, it helps. Is there any way to make it work with continuous scales? My actual data set is continuous.

Also, something like the following looks wrong -- the bars are ordered as 1, 5, 3.

Gadfly.plot(DataFrame(c = [true, true, false, false], x=[1, 5, 1, 3]), 
         x="x",  color="c", Geom.histogram, Scale.x_discrete,
         Theme(bar_spacing=0.5* Gadfly.Compose.cx))

ilyagr avatar Aug 11 '20 01:08 ilyagr

e.g. Scale.x_discrete(levels=[1,3,4,5]) See the Scales section in the Tutorial.

If your scale is really continuous, you can set e.g. Geom.histogram(limits=(min=0, max=5), bincount=5), see histogram examples in the plot gallery.

Mattriks avatar Aug 11 '20 01:08 Mattriks

That works, thank you very much! I'm not sure if it'd be easy, but it'd be nice if setting bincount didn't affect the limits, and if the defaults were better.

There is one more bug. On the log scale, bars of height 1 disappear, even if I force the y axis to extend below 1:

Gadfly.plot(DataFrame(c = [true, true, false], x=[1, 3, 1]), x="x",  
  Gadfly.Scale.y_log10(minvalue=0.5), 
  Geom.histogram(minbincount=5, limits=(min=0, max=4)))

logscale

(My actual example has both log scale and colors, so in my mind all of these issues are related, but perhaps that should be a separate bug).

Update: I had the wrong code pasted before (without the minvalue), this is now fixed.

ilyagr avatar Aug 11 '20 01:08 ilyagr

What's in your original post isn't a bug, Gadfly is simply choosing automatic bins (which you can manually set as shown above - I'd suggest using bincount, rather than minbincount in Geom.histogram).

The 2nd issue here about using Geom.histogram with Scale.y_log10 is tricky, because a histogram y-axis typically starts from zero. Perhaps try using Scale.y_sqrt instead.

Mattriks avatar Aug 11 '20 01:08 Mattriks

Currently, it seems that histograms are hard-coded to bottom out at 1.0 when drawn on a log scale. Perhaps if you just change them to bottom out at 0.8 (and the default scale to start at 0.8), that would be at least a temporary workaround?

It's not quite perfect, as it doesn't help when density = true.

Thank you again for the help.

ilyagr avatar Aug 11 '20 01:08 ilyagr

Limits issue noted on discourse: https://discourse.julialang.org/t/unexpected-behaviour-for-custom-histogram-limits-in-gadfly/

Mattriks avatar Aug 25 '20 23:08 Mattriks