pysradb
pysradb copied to clipboard
[BUG] missing run-related entries for experiments with high number of runs
Describe the bug
Missing run-related data for experiments having many runs. When I run pysradb metadata SRP245574
, certain rows have missing entries for run_accession
, run_total_spots
, run_total_bases
. I don't think this part is a bug per se as those data are really missing in the response from NCBI given by the function get_esummary_response
. It seems that this missing information happens for experiments that have many runs (not sure the threshold). However, those data are accessible using get_efetch_response
, which is used when --detailed
is selected. Because of this, in SRAweb.sra_metadata
, I believed the merging of results from get_esummary_response
and get_efetch_response
is not generating a dataframe we expect. For example, pysradb metadata --detailed SRP245574
outputs a table with many rows with missing experiment accession. Those rows correspond to the runs which were not included in the results from get_esummary_response
but included in those from get_efetch_response
.
To Reproduce
Steps to reproduce the behavior:
pysradb metadata SRP245574
pysradb metadata --detailed SRP245574
Desktop :
- OS: Linux MSI 5.15.133.1-microsoft-standard-WSL2
- Python version: 3.10
Additional context I would like to work on this issue if it is okay.