ncbi-genome-download
ncbi-genome-download copied to clipboard
Not downloading suppressed or replaced Refseq assembly accessions
Dear,
I am using a list of Refseq assembly accessions, I constructed a few months ago, to download the corresponding fasta files. When I try to download this list again, some of the fasta files are not downloaded.
It seems that the missing downloads are from Refseq assembly accessions that are "replaced" (e.g. https://www.ncbi.nlm.nih.gov/assembly/GCF_000699585.1/) or "suppressed" (e.g. https://www.ncbi.nlm.nih.gov/assembly/GCF_000155855.1/).
For reproducibility Is there any way of downloading these as well?
Best regards, Bob
:+1: Also having this issue.
Looks like the reason for this is that the FTP download url is fetched via the "*summary.txt" file (example), which only contains the latest accession versions, and doesn't list old ones.
I'm running into this issue as well. I'm guessing that no one has found a good solution. Would it be possible to automatically switch to the accession that has replaced the old (replaced/suppressed) accession?
Keep in mind that ncbi-genome-download
is just a fancy frontend for the NCBI FTP server, using the assembly summary files to get all the info. If the NCBI deletes a line from that file, for all ncbi-genome-download
cares, that entry is gone.
Keep in mind that ncbi-genome-download is just a fancy frontend for the NCBI FTP server, using the assembly summary files to get all the info. If the NCBI deletes a line from that file, for all ncbi-genome-download cares, that entry is gone.
It appears that the assembly_summary_historical.txt
could be used. The 18th column lists the latest assembly for those assemblies that have been suppressed/replaced.