datasets icon indicating copy to clipboard operation
datasets copied to clipboard

annotation_info missing for GCF_000002945.1 in genome/accession/{accession}/dataset_report

Open manulera opened this issue 1 year ago • 4 comments

Hi @olearyna,

I was using the field annotation_info from genome/accession/{accession}/dataset_report to tell users whether a given assembly has annotations. Since yesterday, it seems that annotation_info is missing from the response for GCF_000002945.1.

Compare:

  • Request with GCF_000002945.1, missing annotation_info: https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report
  • Request with GCA_000002945.2 (synonym), returns annotation_info: https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_000002945.2/dataset_report

Is this intentional? And is there a better way to check whether a given assembly has annotations?

The annotations can still be accessed for GCF_000002945.1 anyway, see https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/annotation_report?search_text=ase1

manulera avatar Jun 25 '24 08:06 manulera

Similarly, this response is empty ( if setting has_annotation=true).

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.has_annotation=true

I guess the meaning of this might be that the annotation comes from the paired assembly GCA_000002945.2?

manulera avatar Jun 25 '24 08:06 manulera

I figured I can use this endpoint instead to check for the annotation being present.

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCA_006386175.1/annotation_report/download_summary

However, this endpoint gives the same error (404) when using an invalid accession and when using an accession that does not exist. The nice thing of the dataset_report endpoint was that in a single request, you could get info on whether the accession number exists, and whether it has annotations

manulera avatar Jun 25 '24 09:06 manulera

Hi manulera

Thanks for opening this issue. GCF_000002945.1 was recently updated to version 2 but there is an issue with the data release for the new version. We hope to get it resolved soon.

For checking if there is an annotation, this is the correct URL https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.has_annotation=true. It should work when the bug with the version update is fixed. Additionally we are looking into a better response when a genome is not annotated.

I'll ping the issue when the version release is fixed.

Nuala

olearyna avatar Jun 25 '24 20:06 olearyna

Hi manulera,

The issue with the release of GCF_000002945.2 has been fixed. You can view the data report for the this latest version here

The previous version has also been fixed. To view a data report for a non-latest assembly you need to append the URL with a filter for all assemblies. https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_000002945.1/dataset_report?filters.assembly_version=all_assemblies

Let me know if you have any more issues.

Nuala

olearyna avatar Jun 28 '24 17:06 olearyna

Hi @olearyna thank you so much for fixing this and for the followup explanation, now I can use a single request to give a warning if there are newer assemblies and an error if the current assembly does not have annotations.

manulera avatar Jul 16 '24 14:07 manulera

A followup to this, would it be possible to get also old assembly accession numbers with this endpoint using something like ?filters.assembly_version=all_assemblies?

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/sequence_accession/{sequence_accession}/sequence_assemblies

manulera avatar Jul 16 '24 15:07 manulera

Hi Manu,

Thanks for the suggestion we'll look into adding this filter.

All the best,

Nuala

olearyna avatar Jul 16 '24 15:07 olearyna

Hi @olearyna any chance that this will ever be supported? It would be nice to have an endpoint to validate a link between between a sequence identifier and assembly identifier. For example:

https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/sequence_accession/NC_003424.3/sequence_assemblies

NC_003424.3 is part of both GCF_000002945.1 and GCF_000002945.2, but the endpoint only returns GCF_000002945.2

manulera avatar Nov 18 '25 14:11 manulera