pysradb icon indicating copy to clipboard operation
pysradb copied to clipboard

[ENH] Include data processing steps, reference to which the reads were aligned or if possible lab protocol into the main table

Open ajandria opened this issue 1 year ago • 1 comments

Is your feature request related to a problem? Please describe.

I was wondering whether it is possible to also retrieve data processing description that is present in the sample's records in GEO. See here for an example: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM6005004 - there is a lot of information that we would like to see in the table that pysradb generates:

Status
Title
Sample type
Source name
Organism
Characteristics
Treatment protocol
Growth protocol
Extracted molecule
Extraction protocol
Library strategy
Library source
Library selection
Instrument model
Description
Data processing

Describe the solution you'd like

I like the table that is currently generated using the following: df = db.sra_metadata(df["study_accession"], detailed = True, expand_sample_attributes = True, output_read_lengths = True) although I feel like it is missing sometimes crucial information that is only included in GEO under specific records of the samples. For an example it the record of the sample that I have included above you can find the following:

Sequenced reads were trimmed for adaptor sequence and low-quality sequence (bbduk; minlength=30, qtrim=rl, trimq=15)
Reads were then mapped to the reference genome of Mus musculus (GRCm38) using STAR aligner version 2.5.3a with parameters --quantMode GeneCounts --runThreadN 4
Assembly: GRCm38

It would be nice to get that into the sra_metadata table too if that is possible. I guess for now I could just use geoquery for that and then merge two tables if possible by GSM sample ids, although I would need to test that. Then probably the hustle including this here would be redundant. But still it seems like a nice direction that one could take to expand this :)

Thank you for your work so far!

ajandria avatar Apr 11 '23 12:04 ajandria