ncbi-acc-download icon indicating copy to clipboard operation
ncbi-acc-download copied to clipboard

running with `--format fasta` creates an empty fa file

Open orenmn opened this issue 4 years ago • 2 comments

(This was already mentioned in https://github.com/kblin/ncbi-acc-download/issues/13#issuecomment-531677362, but I think it is better to have a separate issue.)

ncbi-acc-download --format fasta --recursive --verbose AAXATB000000000.1 creates an empty fasta file, while ncbi-acc-download --format genbank --recursive --verbose AAXATB000000000.1 creates a genbank file as expected.

The same thing happened when I tried on ACIN00000000.3.

orenmn avatar Nov 08 '20 12:11 orenmn

IIUC, an easy fix is to implement --format fasta by using --format genbank (the default) and then using SeqIO.convert (from Biopython), e.g.: SeqIO.convert('AAXATB000000000.1.gbk', 'genbank', 'AAXATB000000000.1.gbk.fasta', 'fasta')

orenmn avatar Nov 08 '20 12:11 orenmn

The NCBI Entrez API does deliver FASTA files, just not if you query for WGS master entries. I don't really want to depend on Biopython for all of ncbi-genome-download, but arguably we could go that path if --recursive is specified, as we depend on Biopython for --recursive anyway.

kblin avatar Nov 08 '20 17:11 kblin