tidySingleCellExperiment icon indicating copy to clipboard operation
tidySingleCellExperiment copied to clipboard

subsample sce object based on factor in colData

Open baj12 opened this issue 2 years ago • 1 comments

I would like to sub-sample a singleCellExperiment object based on a factorial in colData.

I have a singleCellExperiment object:

> sce
# A SingleCellExperiment-tibble abstraction: 13,268,769 × 6
# Features=42 | Assays=exprs

with some colData:

> colData(sce)
DataFrame with 13268769 rows and 5 columns
         sample_id condition patient_id    label1 cluster_id
          <factor>  <factor>   <factor> <numeric>   <factor>
1            D929I       Ref      D929I        36        302
2            D929I       Ref      D929I        29        285
3            D929I       Ref      D929I        50        103
4            D929I       Ref      D929I        36        302
5            D929I       Ref      D929I        51        181
...            ...       ...        ...       ...        ...
13268765     D232I       Ref      D232I        51        201
13268766     D232I       Ref      D232I        28        304
13268767     D232I       Ref      D232I        50        5  
13268768     D232I       Ref      D232I        51        184
13268769     D232I       Ref      D232I        18        364

I would like to subsample based on the cluster_id column such that I have max X (500) events of each cluster.

I can get the selection of cells using the following code:

> sce %>% group_by(cluster_id) %>% slice_sample(n=500) %>% ungroup()
tidySingleCellExperiment says: A data frame is returned for independent data analysis.
# A tibble: 200,000 × 6
   .cell    sample_id condition patient_id label1 cluster_id
   <chr>    <fct>     <fct>     <fct>       <dbl> <fct>     
 1 4002318  D0749I    Ref       D0749I         60 1         
 2 10259368 D590I     Ref       D590I          60 1         
 3 12615676 D232I     Ref       D232I          25 1         
 4 6765422  D694I     Ref       D694I          25 1         
 5 9415336  D0553I    Ref       D0553I         60 1         
 6 7245671  D694I     Ref       D694I          25 1         
 7 7177144  D694I     Ref       D694I          42 1         
 8 7002069  D694I     Ref       D694I          49 1         
 9 8732040  D615I     Ref       D615I          60 1         
10 3989255  D0749I    Ref       D0749I         60 1         
# … with 199,990 more rows
# ℹ Use `print(n = ...)` to see more rows

But I don't know how I would use this to filter the original singleCellExperiment object.

Could you please give me a pointer?

Thanks

baj12 avatar Feb 19 '23 13:02 baj12

sorry, this slipped into the cracks.

At the moment you can use

nest() |>
mutate(map(...)) |>
unnest()

In the future we might be able to add group_by while preserving the SingleCellExperiment. But we don't have plans yet. (Pull requests are always welcome, though!)

stemangiola avatar Mar 06 '23 03:03 stemangiola