StatsPlots.jl icon indicating copy to clipboard operation
StatsPlots.jl copied to clipboard

Pairplot with different colors for each group

Open kirtsar opened this issue 6 years ago • 3 comments

It would be nice for exploratory data analysis to have something like usual pairs() function in R. https://1.bp.blogspot.com/-R1Dqq68sVPY/UTS-6Sf_reI/AAAAAAAAAGM/1N_QuNuCDyY/s1600/iris.png

Currently there are no support for this thing. My version is somewhat ugly, but it does the job: Two elements are given - X is the Dataframe, y is the categorical variable

function pairplot(X, y)
    colnames = String.(names(X))
    classes = nlabel(y)
    n = size(X)[2]
    ylab = convertlabel(1 : classes, y)
    plotter = Matrix{Any}(undef, n,n)
    # with labels
    plotter[1, 1] = histogram(X[:, 1], 
                    ylabel = colnames[1],
                    title = colnames[1])
    for j in 2 : n
        Xi = X[:, 1]
        Xj = X[:, j]
        ylabel = colnames[j]
        plotter[1, j] = scatter(Xi, Xj, 
                        markercolor = ylab,
                        ylabel = ylabel)
        plotter[j, 1] = plot(title = colnames[j])
    end
            
    # diagonal
    for i in 2 : n
        plotter[i, i] = histogram(X[:, i])
    end

    # upper diagonal 
    for i in 1 : n
        for j in 2 : (i - 1)
            Xi = X[:, i]
            Xj = X[:, j]
            plotter[i, j] = plot()
        end
    end

    # lower diagonal 
    for i in 2 : n
        for j in (i + 1) : n
            Xi = X[:, i]
            Xj = X[:, j]
            plotter[i, j] = scatter(Xi, Xj, 
                        markercolor = ylab)
        end
    end
    plot(plotter..., 
        layout=grid(n,n),
        legend = false)
end

For Iris dataset it looks like: corrplot

kirtsar avatar Feb 25 '19 15:02 kirtsar

You can get something similar with @df iris corrplot(cols(1:4), group = :Species), but not the different colors for the groups: skaermbillede 2019-02-25 kl 16 57 53 That's because the colors already have a color, defined by the correlation coeffecient. That could possibly be changed, so you could pass a vector of colors to markercolor. Would that be desirable?

mkborregaard avatar Feb 25 '19 15:02 mkborregaard

You can get something similar with @df iris corrplot(cols(1:4), group = :Species), but not the different colors for the groups: skaermbillede 2019-02-25 kl 16 57 53 That's because the colors already have a color, defined by the correlation coeffecient. That could possibly be changed, so you could pass a vector of colors to markercolor. Would that be desirable?

Using this macro I was able to do something like:

function pairplot(X, y)
    classes = nlabel(y)
    ylab = convertlabel(1 : classes, y)
    ycol = distinguishable_colors(classes)[ylab]
    @df data corrplot(cols(1:4), group = :Species, markercolor=ycol)
end

the result is: corrplot2 the result is good, even though i don't know how to do "nice" pleasant colors I think maybe it is a good idea to check whether the markercolor is specified; if yes (with categorical array) then pass some color vector to markercolor according to category?

kirtsar avatar Feb 25 '19 16:02 kirtsar

In looking into scattermatrix-plots, I came across this. Friendly bump to say that I would like this feature ^_^

KronosTheLate avatar Mar 21 '22 12:03 KronosTheLate