tidySingleCellExperiment
tidySingleCellExperiment copied to clipboard
Clarifications about `.abundance_counts`
Hi @stemangiola - My apologies for re-asking these questions following up from #92. Would you mind giving me some pointers?
Q1. Would you mind confirming that the duplicated cell identifiers are created as a byproduct of the plotting function and not that the input data has duplicated cell identifiers, correct?
Q2. What exactly does the .abundance_
function is calculating when there is one feature and when there are multiple features? Can we control how the expression grouped for plotting, e.g., median, ...?
Q3. Is it possible to modify the code below so that we can split a cell cluster by some threshold of expression of a marker or set of markers and plot these cell splits next to each other in the bar plot?
For example, in the plot below, CATALYST28meta16 is my cluster IDs on the x-axis. I want to have (instead of 1 bar of all the cells per cluster) 2 bars for each cluster, and one bar is of a group of cells that have "low" CD3 expression and the other bar is of a group of cells that have high CD3 expression?
Q1. Would you mind confirming that the duplicated cell identifiers are created as a byproduct of the plotting function and not that the input data has duplicated cell identifiers, correct?
join_feature shape="long" creates a long table, so .feature cell are repeated (one for every gene)
Q2. What exactly does the
.abundance_
function is calculating when there is one feature and when there are multiple features? Can we control how the expression grouped for plotting, e.g., median, ...?
.abundance_
is not a function but rather a column name. It is just extracting the value for a gene for an assay
Q3. Is it possible to modify the code below so that we can split a cell cluster by some threshold of expression of a marker or set of markers and plot these cell splits next to each other in the bar plot?
yes, you can adapt this code
|> mutate(high_value = .abundance_<xxx> > my_threahold)
then you can group by high_value
category in t ggplot
Thank you @stemangiola for your response. If .abundance_
is simply a column name, then what does the code below (specifically join_features(features = c("CD4", "CD3"))
) plot then because there is some way in which the different features were aggregated to produce the plot?
Or it just simply pull the expression value of each cell for every feature included and just plot them all but not doing anything to the expression among the cells? If so, can we label which dots correspond to which feature?
?tidySingleCellExperiment::join_features just says that "This function extracts information for specified features and returns the information in either long or wide format," but it is not clear how the features are joined.
Thank you again.
F37_sce_backboneClustering |> dplyr::filter(CATALYST28meta16 %in% c("14", "15")) |> join_features(features = c("CD4", "CD3")) |>
ggplot(aes(CATALYST28meta16, .abundance_exprs, fill = CATALYST28meta16)) + geom_violin(position = position_dodge(0.75))
I think you should facet_wrap(~.feature)
, in your plot you are ignoring the feature column.
Thanks @stemangiola -- but I was wondering what does that plot above represents for these genes without facet_wrap(~.feature)
as a plot was still generated. Is it sum of the expression of these 2 genes?