lemur icon indicating copy to clipboard operation
lemur copied to clipboard

collinearity error with control vs treatment test for multiple subjects

Open shobhitagrawal1 opened this issue 2 years ago • 4 comments

Hi, Really interesting work and really thankful for the general ease of use! The data I have has several subjects,each belonging to either control or treatment so the formula i am trying is lemur(sce, design = ~ subject + condition, n_embedding = 30, test_fraction = 0.5) however I am getting this error

Error in handle_design_parameter(design, data, col_data) : The model matrix seems degenerate ('matrix_rank(design_matrix) < ncol(design_matrix)'). Some columns are perfectly collinear. Did you maybe include the same coefficient twice?

Now my understanding is that the one-hot encoding for each of control and treatment is being declared as collinear, could you please tell me how one can run a typical multi-subject (assuming them to be biological replicates) two condition analysis ..

appreciate any help. thanking you shobhit

shobhitagrawal1 avatar Oct 15 '23 21:10 shobhitagrawal1

Hi shobhit,

thank you :)

To fit a multi-subject two-condition analysis, set the design to ~ condition (i.e., drop the subject). This fits a single coefficient explaining the treatment effect for each gene.

If you notice that the subject effects are so strong that corresponding cells from different subjects are not aligned after calling align_by_grouping or align_harmony, you can call each method with the argument alignment_design = ~ condition + subject or alignment_design = ~ condition * subject to make the alignment more flexible. However, I advise to only fit different design and alignment_designs if absolutely necessary, as it complicates the interpretation of the effects.

Best, Constantin

const-ae avatar Oct 16 '23 07:10 const-ae

Dear Constantin, Thank you very much for the prompt reply, much appreciated. I was thinking of also using just condition for the fit and using align_by_grouping. The only hesitation was regarding the replicates the neighborhood analysis needs, will that still be possible without replicates being mentioned in the design matrix?

thank you once again shobhit

shobhitagrawal1 avatar Oct 16 '23 07:10 shobhitagrawal1

Yes. The way the replicates are specified is through the group_by argument in find_de_neighborhoods. Here you would set group_by = vars(subject, condition).

const-ae avatar Oct 16 '23 07:10 const-ae

thanks once again! I will give it a try and get back to you.

shobhitagrawal1 avatar Oct 16 '23 08:10 shobhitagrawal1