Hmisc icon indicating copy to clipboard operation
Hmisc copied to clipboard

Ecdf: curves colours plotted in a different order than specified

Open balwierz opened this issue 8 years ago • 1 comments

When there are multiple curves and group argument is used the colours might get rearranged. Steps to reproduce: library(Hmisc) reds <- rnorm(n=100, mean=5, sd=1) blues <- rnorm(n=100, mean=0, sd=1) Ecdf(x=c(reds, blues), group=c(rep("red", length(reds)), rep("blue", length(blues))), col=c("red", "blue")) And observe that the reds distribution is plotted in blue, and blues distribution is plotted in red.

This is because in ecdf.s group is converted to a factor. group <- as.factor(group) lev <- levels(group) nlev <- length(lev) Levels are not guaranteed to be in the order of the first occurrence. Now lev is in alphabetical order.

In the for loop over the nlev curves to be plotted the data is selected using the alphabetical order. In our case "blue" level is used first (i=1). s <- group == lev[i] x <- X[s] But the colours are used in the original order: lines(x, y, type="s", lty=lty[i], col=col[i], lwd=lwd[i]) In this case col[1] is still "red".

I consider it a serious bug. I have been presenting my research results based on Ecdf numerous times with no curve labelling...

balwierz avatar Oct 24 '16 21:10 balwierz

The order of observations is not a reliable way to assign attributes. To get the behavior you want, make the group variable a factor and assign line attributes in order of the levels of that variable. If you want levels to be defined by the order of first appearance in the data (not a recommended programming practice), use something like g <- factor(x, levels=unique(x)).

harrelfe avatar Oct 25 '16 12:10 harrelfe