lemur
lemur copied to clipboard
collinearity error with control vs treatment test for multiple subjects
Hi, Really interesting work and really thankful for the general ease of use! The data I have has several subjects,each belonging to either control or treatment so the formula i am trying is lemur(sce, design = ~ subject + condition, n_embedding = 30, test_fraction = 0.5) however I am getting this error
Error in handle_design_parameter(design, data, col_data) : The model matrix seems degenerate ('matrix_rank(design_matrix) < ncol(design_matrix)'). Some columns are perfectly collinear. Did you maybe include the same coefficient twice?
Now my understanding is that the one-hot encoding for each of control and treatment is being declared as collinear, could you please tell me how one can run a typical multi-subject (assuming them to be biological replicates) two condition analysis ..
appreciate any help. thanking you shobhit
Hi shobhit,
thank you :)
To fit a multi-subject two-condition analysis, set the design to ~ condition (i.e., drop the subject). This fits a single coefficient explaining the treatment effect for each gene.
If you notice that the subject effects are so strong that corresponding cells from different subjects are not aligned after calling align_by_grouping or align_harmony, you can call each method with the argument alignment_design = ~ condition + subject or alignment_design = ~ condition * subject to make the alignment more flexible. However, I advise to only fit different design and alignment_designs if absolutely necessary, as it complicates the interpretation of the effects.
Best, Constantin
Dear Constantin, Thank you very much for the prompt reply, much appreciated. I was thinking of also using just condition for the fit and using align_by_grouping. The only hesitation was regarding the replicates the neighborhood analysis needs, will that still be possible without replicates being mentioned in the design matrix?
thank you once again shobhit
Yes. The way the replicates are specified is through the group_by argument in find_de_neighborhoods. Here you would set group_by = vars(subject, condition).
thanks once again! I will give it a try and get back to you.