decoupleR icon indicating copy to clipboard operation
decoupleR copied to clipboard

TF analysis between conditions scRNA

Open kiwipeel opened this issue 2 years ago • 9 comments

How can I determine if the TF activity between two conditions in my single cell dataset is significant for a gene?

kiwipeel avatar Mar 16 '24 12:03 kiwipeel

Hi @kiwipeel ,

You can perform differential expression analysis at the pseudobulk levels between conditions and then use the obtained contrast level gene statistics as input for decoupler. You have an example of this workflow in this vignette. It is in python but should be relatively easy to reproduce in R if that is a limitation. Hope this is helpful!

PauBadiaM avatar Mar 18 '24 07:03 PauBadiaM

Hi @kiwipeel ,

You can perform differential expression analysis at the pseudobulk levels between conditions and then use the obtained contrast level gene statistics as input for decoupler. You have an example of this workflow in this vignette. It is in python but should be relatively easy to reproduce in R if that is a limitation. Hope this is helpful!

Why can't I just apply statistical tests to the score values generated from the ULM model? Thank you in advance

kiwipeel avatar Mar 18 '24 08:03 kiwipeel

Hi @kiwipeel ,

You could also do that, but if the objective is to compare conditions I would recommend to go the pseudobulk route since with it you do not overinflate the p-values by considering single-cells as true replicates (which are not).

PauBadiaM avatar Mar 18 '24 08:03 PauBadiaM

Hi @kiwipeel ,

You could also do that, but if the objective is to compare conditions I would recommend to go the pseudobulk route since with it you do not overinflate the p-values by considering single-cells as true replicates (which are not).

Thank you. Do the p-values in the run_ulm results represent the significance of the scores for each cell and transcription factor, am I right? Why do we create a new assay from all of these scores while there are scores that don't have significant p-values?

kiwipeel avatar Mar 18 '24 09:03 kiwipeel

Hi @kiwipeel ,

Indeed! We keep all of them since p-value thresholding is completely arbitrary, depending on the application you might want to use a more strict or relax threshold.

PauBadiaM avatar Mar 18 '24 09:03 PauBadiaM

Hi @kiwipeel ,

Indeed! We keep all of them since p-value thresholding is completely arbitrary, depending on the application you might want to use a more strict or relax threshold.

Thank you. However, if I put a threshold on the p-value, it implies that there will be missing values in the new assay we generate from the tf scores. What is the correct way to handle this?"

kiwipeel avatar Mar 18 '24 10:03 kiwipeel

It really depends on the downstream task you want to use them for, in your case since you are interested in contrasting conditions I would again recommend to do it at the pseudobulk level, there the filtering by p-value is going to be easier to handle since you obtain a single vector of changes of activities that may or may not be significant.

PauBadiaM avatar Mar 19 '24 09:03 PauBadiaM

It really depends on the downstream task you want to use them for, in your case since you are interested in contrasting conditions I would again recommend to do it at the pseudobulk level, there the filtering by p-value is going to be easier to handle since you obtain a single vector of changes of activities that may or may not be significant.

Thank you again. One last question.. After getting TF assay from model , should I use ScaleData() function on tf assay by using split.by argument based on my conditions ?

kiwipeel avatar Mar 19 '24 09:03 kiwipeel

Hi @kiwipeel , if it is just for plotting yes, I am not sure about the split.by argument though, you would want to see the differences between your conditions instead no?

PauBadiaM avatar Mar 19 '24 09:03 PauBadiaM