pysradb icon indicating copy to clipboard operation
pysradb copied to clipboard

[BUG] missing run-related entries for experiments with high number of runs

Open masarunakajima opened this issue 4 months ago • 0 comments

Describe the bug Missing run-related data for experiments having many runs. When I run pysradb metadata SRP245574, certain rows have missing entries for run_accession, run_total_spots, run_total_bases. I don't think this part is a bug per se as those data are really missing in the response from NCBI given by the function get_esummary_response. It seems that this missing information happens for experiments that have many runs (not sure the threshold). However, those data are accessible using get_efetch_response, which is used when --detailed is selected. Because of this, in SRAweb.sra_metadata, I believed the merging of results from get_esummary_response and get_efetch_response is not generating a dataframe we expect. For example, pysradb metadata --detailed SRP245574 outputs a table with many rows with missing experiment accession. Those rows correspond to the runs which were not included in the results from get_esummary_response but included in those from get_efetch_response.

To Reproduce Steps to reproduce the behavior: pysradb metadata SRP245574 pysradb metadata --detailed SRP245574

Desktop :

  • OS: Linux MSI 5.15.133.1-microsoft-standard-WSL2
  • Python version: 3.10

Additional context I would like to work on this issue if it is okay.

masarunakajima avatar Feb 14 '24 23:02 masarunakajima