ncbi-genome-download icon indicating copy to clipboard operation
ncbi-genome-download copied to clipboard

Not downloading suppressed or replaced Refseq assembly accessions

Open BobFukkink opened this issue 4 years ago • 5 comments

Dear,

I am using a list of Refseq assembly accessions, I constructed a few months ago, to download the corresponding fasta files. When I try to download this list again, some of the fasta files are not downloaded.

It seems that the missing downloads are from Refseq assembly accessions that are "replaced" (e.g. https://www.ncbi.nlm.nih.gov/assembly/GCF_000699585.1/) or "suppressed" (e.g. https://www.ncbi.nlm.nih.gov/assembly/GCF_000155855.1/).

For reproducibility Is there any way of downloading these as well?

Best regards, Bob

BobFukkink avatar Oct 26 '20 16:10 BobFukkink

:+1: Also having this issue.

jayrbolton avatar Nov 17 '20 23:11 jayrbolton

Looks like the reason for this is that the FTP download url is fetched via the "*summary.txt" file (example), which only contains the latest accession versions, and doesn't list old ones.

jayrbolton avatar Nov 18 '20 00:11 jayrbolton

I'm running into this issue as well. I'm guessing that no one has found a good solution. Would it be possible to automatically switch to the accession that has replaced the old (replaced/suppressed) accession?

nick-youngblut avatar Aug 05 '21 09:08 nick-youngblut

Keep in mind that ncbi-genome-download is just a fancy frontend for the NCBI FTP server, using the assembly summary files to get all the info. If the NCBI deletes a line from that file, for all ncbi-genome-download cares, that entry is gone.

kblin avatar Aug 05 '21 09:08 kblin

Keep in mind that ncbi-genome-download is just a fancy frontend for the NCBI FTP server, using the assembly summary files to get all the info. If the NCBI deletes a line from that file, for all ncbi-genome-download cares, that entry is gone.

It appears that the assembly_summary_historical.txt could be used. The 18th column lists the latest assembly for those assemblies that have been suppressed/replaced.

nick-youngblut avatar Aug 05 '21 09:08 nick-youngblut