recount3 icon indicating copy to clipboard operation
recount3 copied to clipboard

[FEATURE] Subsetting SRA projects by samples prior to making the RangedSummarizedExperiment object

Open schelhorn opened this issue 2 years ago • 0 comments

Hi Leonardo,

thanks for recount3; I was wondering whether there is a hack to produce the RangedSummarizedExperiment objects using recount3::create_rse() for only a subset of samples in a particular SRA project. The main reason for this is that making the SE object seems to take a lot of memory (and time) for the larger, 3000+ sample projects in SRA just get to the subset of samples one is interested in.

Possibly this is already implemented somewhere in the package (since the read counts function seems to have a sample selector, see read_counts <- function(counts_file, samples = NULL)); if so, I'd be happy about a pointer, thanks.

If it is not implemented yet, perhaps a simple feature implementation would be allowing the user to look into the SRA project metadata by recount3::available_samples(my_sra_project), selecting samples by their external_id, and then providing these to recount3::create_rse() using a new parameter external_sample_ids=NULL. This could then be directly used by recount3::create_rse_manual() in its call to recount3::read_counts() (perhaps after some sanity checking for existing and/or duplicated external sample IDs).

schelhorn avatar Nov 29 '21 08:11 schelhorn