grpregOverlap icon indicating copy to clipboard operation
grpregOverlap copied to clipboard

ExpandX is not working

Open LamineTourelab opened this issue 2 years ago • 5 comments

Hi I tried to run the grpregOverlap but i have a issue in the ExpandX function. The problem is :

expand X to X.latent

``X.latent <- NULL names <- NULL

for(i in 1:nrow(incidence.mat)) { idx <- incidence.mat[i,]==1 X.latent <- cbind(X.latent, X[, idx, drop=FALSE]) names <- c(names, colnames(incidence.mat)[idx]) # colnames(X.latent) <- c(colnames(X.latent), colnames(X)[incidence.mat[i,]==1]) }`` Everytime i try to run the function same error, they is not the same number of row but it like normal because X.latent is NULL. So if someone have the same issue with and have the solution it will be great to share it. Thanks

LamineTourelab avatar Apr 20 '22 13:04 LamineTourelab

I infer (from some correspondence we've had) that your data X is a data.frame, whereas I typically pass a matrix for X. Here's an example that I think will reproduce your problem and also show how things "work" when X is a matrix. I'm not yet clear on what's causing this, but I think this helps focus the troubleshooting.

## install grpregOverlap from github using devtools
install.packages("devtools")
devtools::install_github("YaoHuiZeng/grpregOverlap")
library(grpregOverlap)

## generate simple synthetic data with overlapping groups
n <- 10
p <- 3
X <- data.frame(
  gene1 = rnorm(n),
  gene2 = rnorm(n),
  gene3 = rnorm(n)
)
group <- list(
  "pathway1" = c("gene1", "gene2"),
  "pathway2" = c("gene2", "gene3")
)
y <- rnorm(10)

## fitting fails with output:
## Error in data.frame(..., check.names = FALSE) : 
##   arguments imply differing number of rows: 0, 10
fm <- grpregOverlap(X, y, group, returnX.latent = TRUE)


## same, but now X is a matrix
n <- 10
p <- 3
X <- matrix(rnorm(n*p), n, p)
colnames(X) <- c("gene1","gene2","gene3")
group <- list(
  "pathway1" = c("gene1", "gene2"),
  "pathway2" = c("gene2", "gene3")
)
y <- rnorm(10)

## fitting works
fm <- grpregOverlap(X, y, group, returnX.latent = TRUE)

dankessler avatar Apr 20 '22 16:04 dankessler

It seems like the error is happening at this line. From some testing, cbind(NULL, X) behaves differently depending on whether X is a matrix or a data.frame.

# cbind is able to bind NULL and matrix
> X = matrix(1:12, 4, 3)
> cbind(NULL, X)
     [,1] [,2] [,3]
[1,]    1    5    9
[2,]    2    6   10
[3,]    3    7   11
[4,]    4    8   12
# cbind is unable to bind NULL and data.frame
> X = data.frame(a = 1:4, b = 5:5, c = 9:12)
> cbind(NULL, X)
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 0, 4

For now, I'd suggest "working around" this by making X a matrix with named columns, like I did in my example code:

X <- matrix(rnorm(n*p), n, p)
colnames(X) <- c("gene1","gene2","gene3")

dankessler avatar Apr 20 '22 16:04 dankessler

Hi Dan, I would like to thanks you for your help. I finely resolve the problem. I will share it here for everyone who will have the same problem. gene_present # is my dataframe X=as.matrix(gene_present) y=traitData$labs groups=sub_pathway$pathways I had a binary outcome (R or NR) wich i transform as 1=R and 0=NR and i named the column labs. And it should work res=grpregOverlap::grpregOverlap(X, y, groups, returnX.latent = TRUE) Thanks

LamineTourelab avatar Apr 21 '22 07:04 LamineTourelab

Glad it worked!

Just a statistical note: if your response is binary, you might want to consider something other than the default setting of family = "gaussian" in your call to grpregOverlap and might want to instead do something like logistic regression (with family = "binomial", I think), but you know your data better than me so feel free to ignore this suggestion.

dankessler avatar Apr 21 '22 15:04 dankessler

Yes it is exactly what i did with the grlasso penalty. Thanks for the suggestion.

LamineTourelab avatar Apr 21 '22 17:04 LamineTourelab