StatsPlots.jl
StatsPlots.jl copied to clipboard
Pairplot with different colors for each group
It would be nice for exploratory data analysis to have something like usual pairs() function in R. https://1.bp.blogspot.com/-R1Dqq68sVPY/UTS-6Sf_reI/AAAAAAAAAGM/1N_QuNuCDyY/s1600/iris.png
Currently there are no support for this thing. My version is somewhat ugly, but it does the job: Two elements are given - X is the Dataframe, y is the categorical variable
function pairplot(X, y)
colnames = String.(names(X))
classes = nlabel(y)
n = size(X)[2]
ylab = convertlabel(1 : classes, y)
plotter = Matrix{Any}(undef, n,n)
# with labels
plotter[1, 1] = histogram(X[:, 1],
ylabel = colnames[1],
title = colnames[1])
for j in 2 : n
Xi = X[:, 1]
Xj = X[:, j]
ylabel = colnames[j]
plotter[1, j] = scatter(Xi, Xj,
markercolor = ylab,
ylabel = ylabel)
plotter[j, 1] = plot(title = colnames[j])
end
# diagonal
for i in 2 : n
plotter[i, i] = histogram(X[:, i])
end
# upper diagonal
for i in 1 : n
for j in 2 : (i - 1)
Xi = X[:, i]
Xj = X[:, j]
plotter[i, j] = plot()
end
end
# lower diagonal
for i in 2 : n
for j in (i + 1) : n
Xi = X[:, i]
Xj = X[:, j]
plotter[i, j] = scatter(Xi, Xj,
markercolor = ylab)
end
end
plot(plotter...,
layout=grid(n,n),
legend = false)
end
For Iris dataset it looks like:
You can get something similar with @df iris corrplot(cols(1:4), group = :Species)
, but not the different colors for the groups:
That's because the colors already have a color, defined by the correlation coeffecient. That could possibly be changed, so you could pass a vector of colors to
markercolor
. Would that be desirable?
You can get something similar with
@df iris corrplot(cols(1:4), group = :Species)
, but not the different colors for the groups:That's because the colors already have a color, defined by the correlation coeffecient. That could possibly be changed, so you could pass a vector of colors to
markercolor
. Would that be desirable?
Using this macro I was able to do something like:
function pairplot(X, y)
classes = nlabel(y)
ylab = convertlabel(1 : classes, y)
ycol = distinguishable_colors(classes)[ylab]
@df data corrplot(cols(1:4), group = :Species, markercolor=ycol)
end
the result is:
the result is good, even though i don't know how to do "nice" pleasant colors
I think maybe it is a good idea to check whether the markercolor is specified; if yes (with categorical array) then pass some color vector to markercolor according to category?
In looking into scattermatrix-plots, I came across this. Friendly bump to say that I would like this feature ^_^