Gadfly.jl icon indicating copy to clipboard operation
Gadfly.jl copied to clipboard

Heatmaps with marginal dendrograms

Open dcjones opened this issue 9 years ago • 12 comments

This is one of the few things I still load up R to plot.

I think there are two parts to implement this: Stat.heatmap that computes pairwise distances and optionally clusters columns and rows. And Guide.dendrogram that draws marginal dendrograms.

dcjones avatar Apr 03 '15 15:04 dcjones

This would be really cool! It might be also useful to assign a compose context into the margins as well. It seems Guide.annotation doesn't allow you to go outside the plot panel at the moment? I've been attempting some ropey work arounds, trying to put coloured rectangles in the plot panel just as a guide for cluster segments for non-hierarchical clustering. I could imagine other row and col summaries that might be nice, such as averages (perhaps line plots) or num of instances, of particular cluster segments where it makes sense.

nw11 avatar May 11 '15 09:05 nw11

It seems Guide.annotation doesn't allow you to go outside the plot panel at the moment?

It doesn't, but that would actually be extremely easy to change. I do want to do much more with guides, like marginal histograms and "rugs".

dcjones avatar May 11 '15 15:05 dcjones

THis would be a really nice feature. One thing that I thing would make it even nicer would be to allow for the possibility of some other other Stat.foo feeding into Guide.dendrogram. The particular usecase I had in mind is one in which the heatmap quantifies some type of interaction (say a binding affinity of columns to rows) whereas the dendrograms show phylogenetic relationships (or any other hierarchical relationship) between the same, which could be specified in the DataFrame apart from the interactions, which may or may not have any logical relationship to the hierarchies.

gajomi avatar May 13 '15 23:05 gajomi

Hi! I'm sorry if this may have been repeated elsewhere, but I am looking for a good way to plot a clustered heatmap like this one in Julia without having to go back to R. Good plot recipes have been contributed like in https://github.com/JuliaPlots/StatsPlots.jl/pull/214, but they're still not as good as heatmap.2 in R's gplots, or even heatmaply (R's plotly offline).

Could I pls know if this has been implemented, or any other way to do this? I tried Stat.heatmap and Guide.dendrogram in Gadfly, I can see it hasn't been implemented yet.

srgk26 avatar Apr 06 '19 13:04 srgk26

For Gadfly, you can write custom statistics (#894):

module stat

using Distances, Gadfly

struct HeatMapStatistic <: Gadfly.StatisticElement
   metric::Metric 
end
heatmap(; metric=Euclidean()) = HeatMapStatistic(metric)


Gadfly.Stat.input_aesthetics(stat::HeatMapStatistic) =  [:z, :x, :y]
Gadfly.Stat.output_aesthetics(stat::HeatMapStatistic) = [:xmin, :xmax, :ymin, :ymax]
Gadfly.Stat.default_scales(stat::HeatMapStatistic) = [Scale.z_func(), Scale.x_discrete(), Scale.y_discrete(), Scale.color_continuous()]


function Gadfly.Stat.apply_statistic(stat::HeatMapStatistic,
                         scales::Dict{Symbol, Gadfly.ScaleElement},
                         coord::Gadfly.CoordinateElement,
                         aes::Gadfly.Aesthetics)

    xs = repeat(aes.x, outer=length(aes.y))
    aes.y = repeat(aes.y, inner=length(aes.x))
    aes.x = xs
    dist = pairwise(stat.metric, aes.z, dims=1)
    Stat.apply_statistic(Stat.rectbin(), scales, coord, aes)
    color_scale = get(scales, :color, Scale.color_continuous)
    Scale.apply_scale(color_scale, [aes], Gadfly.Data(color=vec(dist)))
    aes.color_key_title = string(typeof(stat.metric), "\n","distance")
end

end # module

using Distances, DataFrames, Gadfly, RDatasets
iris = dataset("datasets","iris")
Z = Matrix{Float64}(iris[1:75,1:4])  
p = plot(z=Z, x=1:75, y=1:75, stat.heatmap(metric=Jaccard()), Geom.rectbin,
    Scale.color_continuous(colormap=Scale.lab_gradient("steelblue3","white","darkred")),
    Guide.xticks(ticks=[1:2:75;]), Guide.yticks(ticks=[1:2:75;]),
    Coord.cartesian(aspect_ratio=1.0)
    )

Now need to figure out an easy way to do the marginal dendrograms ...

Mattriks avatar Apr 07 '19 07:04 Mattriks

Here's the dendrograms:

# module stat 
# add these lines to module stat above

using Compose, Clustering, StatsBase
PositionedGuide = Gadfly.Guide.PositionedGuide
top_guide_position = Gadfly.Guide.top_guide_position
right_guide_position = Gadfly.Guide.right_guide_position

function branches(hc::Hclust, useheight::Bool=true)
    order = StatsBase.indexmap(hc.order)
    nodepos = Dict(-i => (float(order[i]), 0.0) for i in hc.order)

    branches1 = Vector{NTuple{2, Float64}}[]
    branches2 = Vector{NTuple{2, Float64}}[]
    ypos = 0.0
    
    for i in 1:size(hc.merges, 1)
        x1, y1 = nodepos[hc.merges[i, 1]]
        x2, y2 = nodepos[hc.merges[i, 2]]

        xpos = (x1 + x2) / 2
        h = useheight ? hc.heights[i] : 1.0
        ypos = max(y1, y2) + h
        
        nodepos[i] = (xpos, ypos)
        push!(branches1, [(x1,y1), (x1,ypos), (x2,ypos), (x2,y2)])
        push!(branches2, [(y1,x1), (ypos,x1), (ypos,x2), (y2,x2)])
    end

    return (branches1, branches2, ypos)
end

struct Dendrogram <: Gadfly.GuideElement
end

function Gadfly.Guide.render(guide::Dendrogram, theme::Gadfly.Theme, aes::Gadfly.Aesthetics)
    hc = hclust(pairwise(Jaccard(), aes.z, dims=1))
    branches1, branches2, ypos = branches(hc)
    n = size(aes.z,1)
    ctx = context(units=UnitBox(0.5, ypos, n, -ypos, bottompad=4px), minheight=ypos*25)
    compose!(ctx, line(branches1), stroke("black"), linewidth(0.3pt))
    ctx2 = context(units=UnitBox(0, n+0.5,  ypos, -n, leftpad=4px), minwidth=ypos*25)
    compose!(ctx2, line(branches2), stroke("black"), linewidth(0.3pt))
    return [PositionedGuide([ctx], 0, top_guide_position), 
                   PositionedGuide([ctx2], 0, right_guide_position) ]
end

# end module stat


iris = dataset("datasets","iris")
Z = Matrix{Float64}(iris[26:75,1:4])  
n = size(Z,1)
p = plot(z=Z, x=1:n, y=1:n, stat.heatmap(metric=Jaccard()), Geom.rectbin,
    Scale.color_continuous(colormap=Scale.lab_gradient("steelblue3","white","darkred")),
    Coord.cartesian(aspect_ratio=1.0), stat.Dendrogram() 
)

hmap_iris

This could be developed as a separate package (note it depends on package Clustering), and there is a lot more to do here. To make this easier, some coding in Gadfly would also help, including:

  • a generic aes field which can be used to pass the Stat.heatmap metric to the plot aes, so it can be used by Guide.dendogram.
  • the height/width of the Guides should be controllable through Theme, in a generic way. I'm thinking e.g. of a vector of the same length as the number of guides, for each side.

Mattriks avatar Apr 14 '19 13:04 Mattriks

i'm not averse to adding Clustering as a dependency to Gadfly to support such commonly useful plots.

bjarthur avatar Apr 15 '19 11:04 bjarthur

i'm not averse to adding Clustering as a dependency to Gadfly to support such commonly useful plots.

What about using Requires for the Clustering related code so it only gets loaded if Clustering is imported?

tlnagy avatar Apr 26 '19 21:04 tlnagy

Checkout Heatmap.jl! Note this is a wip (and currently there are issues to be fixed).

Mattriks avatar Apr 28 '19 12:04 Mattriks

Heatmap.jl looks great! do you forsee it getting folded into Gadfly at some point?

bjarthur avatar May 05 '19 11:05 bjarthur

Checkout Heatmap.jl! Note this is a wip (and currently there are issues to be fixed).

It looks great! Thank you. Will wait for it to be fixed. Just a question, does it accept NA/NaN values in the dataset? In R's gplots' heatmap.2 function, the na.color option allows NA values to be specified a colour. Is there an option in your package?

srgk26 avatar May 12 '19 17:05 srgk26

Checkout Heatmap.jl! Note this is a wip (and currently there are issues to be fixed).

Cool package. Will there be an update on this? Thanks!

xinkai-zhou avatar Nov 10 '20 04:11 xinkai-zhou