annotation_info missing for GCF_000002945.1 in genome/accession/{accession}/dataset_report
Hi @olearyna,
I was using the field annotation_info from genome/accession/{accession}/dataset_report to tell users whether a given assembly has annotations. Since yesterday, it seems that annotation_info is missing from the response for GCF_000002945.1.
Compare:
- Request with
GCF_000002945.1, missingannotation_info: https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report - Request with
GCA_000002945.2(synonym), returnsannotation_info: https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_000002945.2/dataset_report
Is this intentional? And is there a better way to check whether a given assembly has annotations?
The annotations can still be accessed for GCF_000002945.1 anyway, see https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/annotation_report?search_text=ase1
Similarly, this response is empty ( if setting has_annotation=true).
https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.has_annotation=true
I guess the meaning of this might be that the annotation comes from the paired assembly GCA_000002945.2?
I figured I can use this endpoint instead to check for the annotation being present.
https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_006386175.1/annotation_report/download_summary
However, this endpoint gives the same error (404) when using an invalid accession and when using an accession that does not exist. The nice thing of the dataset_report endpoint was that in a single request, you could get info on whether the accession number exists, and whether it has annotations
Hi manulera
Thanks for opening this issue. GCF_000002945.1 was recently updated to version 2 but there is an issue with the data release for the new version. We hope to get it resolved soon.
For checking if there is an annotation, this is the correct URL https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.has_annotation=true. It should work when the bug with the version update is fixed. Additionally we are looking into a better response when a genome is not annotated.
I'll ping the issue when the version release is fixed.
Nuala
Hi manulera,
The issue with the release of GCF_000002945.2 has been fixed. You can view the data report for the this latest version here
The previous version has also been fixed. To view a data report for a non-latest assembly you need to append the URL with a filter for all assemblies. https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.assembly_version=all_assemblies
Let me know if you have any more issues.
Nuala
Hi @olearyna thank you so much for fixing this and for the followup explanation, now I can use a single request to give a warning if there are newer assemblies and an error if the current assembly does not have annotations.
A followup to this, would it be possible to get also old assembly accession numbers with this endpoint using something like ?filters.assembly_version=all_assemblies?
https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/sequence_accession/{sequence_accession}/sequence_assemblies
Hi Manu,
Thanks for the suggestion we'll look into adding this filter.
All the best,
Nuala
Hi @olearyna any chance that this will ever be supported? It would be nice to have an endpoint to validate a link between between a sequence identifier and assembly identifier. For example:
https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/sequence_accession/NC_003424.3/sequence_assemblies
NC_003424.3 is part of both GCF_000002945.1 and GCF_000002945.2, but the endpoint only returns GCF_000002945.2