tidySingleCellExperiment icon indicating copy to clipboard operation
tidySingleCellExperiment copied to clipboard

Clarifications about `.abundance_counts`

Open denvercal1234GitHub opened this issue 1 year ago • 4 comments

Hi @stemangiola - My apologies for re-asking these questions following up from #92. Would you mind giving me some pointers?

Q1. Would you mind confirming that the duplicated cell identifiers are created as a byproduct of the plotting function and not that the input data has duplicated cell identifiers, correct?

Q2. What exactly does the .abundance_ function is calculating when there is one feature and when there are multiple features? Can we control how the expression grouped for plotting, e.g., median, ...?

Q3. Is it possible to modify the code below so that we can split a cell cluster by some threshold of expression of a marker or set of markers and plot these cell splits next to each other in the bar plot?

For example, in the plot below, CATALYST28meta16 is my cluster IDs on the x-axis. I want to have (instead of 1 bar of all the cells per cluster) 2 bars for each cluster, and one bar is of a group of cells that have "low" CD3 expression and the other bar is of a group of cells that have high CD3 expression?

Screenshot 2023-09-08 at 18 03 37

denvercal1234GitHub avatar Sep 08 '23 17:09 denvercal1234GitHub

Q1. Would you mind confirming that the duplicated cell identifiers are created as a byproduct of the plotting function and not that the input data has duplicated cell identifiers, correct?

join_feature shape="long" creates a long table, so .feature cell are repeated (one for every gene)

Q2. What exactly does the .abundance_ function is calculating when there is one feature and when there are multiple features? Can we control how the expression grouped for plotting, e.g., median, ...?

.abundance_ is not a function but rather a column name. It is just extracting the value for a gene for an assay

Q3. Is it possible to modify the code below so that we can split a cell cluster by some threshold of expression of a marker or set of markers and plot these cell splits next to each other in the bar plot?

yes, you can adapt this code

|> mutate(high_value = .abundance_<xxx> > my_threahold)

then you can group by high_value category in t ggplot

stemangiola avatar Sep 09 '23 02:09 stemangiola

Thank you @stemangiola for your response. If .abundance_ is simply a column name, then what does the code below (specifically join_features(features = c("CD4", "CD3"))) plot then because there is some way in which the different features were aggregated to produce the plot?

Or it just simply pull the expression value of each cell for every feature included and just plot them all but not doing anything to the expression among the cells? If so, can we label which dots correspond to which feature?

?tidySingleCellExperiment::join_features just says that "This function extracts information for specified features and returns the information in either long or wide format," but it is not clear how the features are joined.

Thank you again.

F37_sce_backboneClustering |> dplyr::filter(CATALYST28meta16 %in% c("14", "15")) |> join_features(features = c("CD4", "CD3")) |>
  ggplot(aes(CATALYST28meta16, .abundance_exprs, fill = CATALYST28meta16)) +  geom_violin(position = position_dodge(0.75))
Screenshot 2023-09-10 at 14 07 33

denvercal1234GitHub avatar Sep 10 '23 13:09 denvercal1234GitHub

I think you should facet_wrap(~.feature), in your plot you are ignoring the feature column.

stemangiola avatar Sep 11 '23 00:09 stemangiola

Thanks @stemangiola -- but I was wondering what does that plot above represents for these genes without facet_wrap(~.feature) as a plot was still generated. Is it sum of the expression of these 2 genes?

denvercal1234GitHub avatar Dec 27 '23 14:12 denvercal1234GitHub