Hmisc
Hmisc copied to clipboard
Ecdf: curves colours plotted in a different order than specified
When there are multiple curves and group
argument is used the colours might get rearranged.
Steps to reproduce:
library(Hmisc)
reds <- rnorm(n=100, mean=5, sd=1)
blues <- rnorm(n=100, mean=0, sd=1)
Ecdf(x=c(reds, blues), group=c(rep("red", length(reds)), rep("blue", length(blues))), col=c("red", "blue"))
And observe that the reds
distribution is plotted in blue, and blues
distribution is plotted in red.
This is because in ecdf.s group
is converted to a factor.
group <- as.factor(group)
lev <- levels(group)
nlev <- length(lev)
Levels are not guaranteed to be in the order of the first occurrence. Now lev
is in alphabetical order.
In the for loop over the nlev
curves to be plotted the data is selected using the alphabetical order. In our case "blue" level is used first (i=1).
s <- group == lev[i]
x <- X[s]
But the colours are used in the original order:
lines(x, y, type="s", lty=lty[i], col=col[i], lwd=lwd[i])
In this case col[1]
is still "red".
I consider it a serious bug. I have been presenting my research results based on Ecdf numerous times with no curve labelling...
The order of observations is not a reliable way to assign attributes. To get the behavior you want, make the group variable a factor and assign line attributes in order of the levels of that variable. If you want levels to be defined by the order of first appearance in the data (not a recommended programming practice), use something like g <- factor(x, levels=unique(x)).