pysradb icon indicating copy to clipboard operation
pysradb copied to clipboard

delimiter in `pysradb metadata --detailed` output

Open jbloom opened this issue 2 years ago • 3 comments

When I run a command like:

pysradb metadata --detailed SRR11085797

the resulting output has inconsistent whitespace. In particular, the "header line" has tab delimiters between columns, but the subsequent data line has space delimiters. This makes parsing of the output difficult (impossible when some of the data fields have whitespace in the values).

This is with pysradb 1.1.0.

jbloom avatar Jan 08 '22 23:01 jbloom

Sorry, I have had issues handling this universally in the past when the output is written to the terminal. However, if you choose to write the output to the disk using --saveto output.tsv, the output.tsv is properly formatted. Other option is to use the Python API as shown in this notebook.

from pysradb.sraweb import SRAweb
db = SRAweb()

df = db.sra_metadata('SRR11085797', detailed=True)
df

saketkc avatar Jan 09 '22 01:01 saketkc

Dear Saket,

This SRAweb is helpful for getting a tabulated data frame. However, I think there is a typo in the header.

listing all the headers

list(df)

Result:

['run_accession',
 'study_accession',
 'study_title',
 'experiment_accession',
 'experiment_title',
 'experiment_desc',
 'organism_taxid ',
 'organism_name',
 'library_name',
 'library_strategy',
 'library_source',
...]

** Note: There is a space after organism_taxid. You may consider removing the space as this may generate an error while extracting the respective column.

Best regards, Chong

ChongLC avatar Sep 03 '22 11:09 ChongLC

thanks @ChongLC! I have fixed this on the master branch.

saketkc avatar Sep 03 '22 15:09 saketkc

This is now fixed in the develop branch. https://github.com/saketkc/pysradb/commit/9fa31da07cecde71b6886043645b01022394718d

saketkc avatar May 16 '23 13:05 saketkc