pysradb
pysradb copied to clipboard
delimiter in `pysradb metadata --detailed` output
When I run a command like:
pysradb metadata --detailed SRR11085797
the resulting output has inconsistent whitespace. In particular, the "header line" has tab delimiters between columns, but the subsequent data line has space delimiters. This makes parsing of the output difficult (impossible when some of the data fields have whitespace in the values).
This is with pysradb
1.1.0.
Sorry, I have had issues handling this universally in the past when the output is written to the terminal. However, if you choose to write the output to the disk using --saveto output.tsv
, the output.tsv
is properly formatted. Other option is to use the Python API as shown in this notebook.
from pysradb.sraweb import SRAweb
db = SRAweb()
df = db.sra_metadata('SRR11085797', detailed=True)
df
Dear Saket,
This SRAweb is helpful for getting a tabulated data frame. However, I think there is a typo in the header.
listing all the headers
list(df)
Result:
['run_accession',
'study_accession',
'study_title',
'experiment_accession',
'experiment_title',
'experiment_desc',
'organism_taxid ',
'organism_name',
'library_name',
'library_strategy',
'library_source',
...]
** Note: There is a space after organism_taxid
. You may consider removing the space as this may generate an error while extracting the respective column.
Best regards, Chong
thanks @ChongLC! I have fixed this on the master branch.
This is now fixed in the develop branch. https://github.com/saketkc/pysradb/commit/9fa31da07cecde71b6886043645b01022394718d