tidySingleCellExperiment
tidySingleCellExperiment copied to clipboard
subsample sce object based on factor in colData
I would like to sub-sample a singleCellExperiment object based on a factorial in colData.
I have a singleCellExperiment object:
> sce
# A SingleCellExperiment-tibble abstraction: 13,268,769 × 6
# Features=42 | Assays=exprs
with some colData:
> colData(sce)
DataFrame with 13268769 rows and 5 columns
sample_id condition patient_id label1 cluster_id
<factor> <factor> <factor> <numeric> <factor>
1 D929I Ref D929I 36 302
2 D929I Ref D929I 29 285
3 D929I Ref D929I 50 103
4 D929I Ref D929I 36 302
5 D929I Ref D929I 51 181
... ... ... ... ... ...
13268765 D232I Ref D232I 51 201
13268766 D232I Ref D232I 28 304
13268767 D232I Ref D232I 50 5
13268768 D232I Ref D232I 51 184
13268769 D232I Ref D232I 18 364
I would like to subsample based on the cluster_id column such that I have max X (500) events of each cluster.
I can get the selection of cells using the following code:
> sce %>% group_by(cluster_id) %>% slice_sample(n=500) %>% ungroup()
tidySingleCellExperiment says: A data frame is returned for independent data analysis.
# A tibble: 200,000 × 6
.cell sample_id condition patient_id label1 cluster_id
<chr> <fct> <fct> <fct> <dbl> <fct>
1 4002318 D0749I Ref D0749I 60 1
2 10259368 D590I Ref D590I 60 1
3 12615676 D232I Ref D232I 25 1
4 6765422 D694I Ref D694I 25 1
5 9415336 D0553I Ref D0553I 60 1
6 7245671 D694I Ref D694I 25 1
7 7177144 D694I Ref D694I 42 1
8 7002069 D694I Ref D694I 49 1
9 8732040 D615I Ref D615I 60 1
10 3989255 D0749I Ref D0749I 60 1
# … with 199,990 more rows
# ℹ Use `print(n = ...)` to see more rows
But I don't know how I would use this to filter the original singleCellExperiment object.
Could you please give me a pointer?
Thanks
sorry, this slipped into the cracks.
At the moment you can use
nest() |>
mutate(map(...)) |>
unnest()
In the future we might be able to add group_by
while preserving the SingleCellExperiment
. But we don't have plans yet. (Pull requests are always welcome, though!)