datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Add publication info to assembly or BioProject fields.

Open conchoecia opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. I would like to be able to quickly determine what research article I should reference simply from looking at the output of datasets for a particular genome assembly or BioProject. This publication information is often on the NCBI website, but when I programmatically access the same accession/BioProject, there is no reference to the publication.

For example, on the BioProject page for the common carp, there is a paper to cite: Chang YS et al., "The complete nucleotide sequence and gene organization of carp (Cyprinus carpio) mitochondrial genome.", J Mol Evol, 1994 Feb;38(2):138-55

However, when I query the same BioProject on the command line there is no publication information: datasets summary genome accession PRJNA682709

{"reports": [{"accession":"GCF_018340385.1","annotation_info":{"busco":{"busco_lineage":"actinopterygii_odb10","busco_ver":"4.1.4","complete":0.98571426,"duplicated":0.62445056,"fragmented":0.0054945056,"missing":0.008791209,"single_copy":0.36126372,"total_count":"3640"},"method":"Best-placed RefSeq; Gnomon","name":"NCBI Cyprinus carpio Annotation Release 101","pipeline":"NCBI eukaryotic genome annotation pipeline","provider":"NCBI RefSeq","release_date":"2021-07-20","report_url":"https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Cyprinus_carpio/101","software_version":"9.0","stats":{"gene_counts":{"non_coding":9553,"other":301,"protein_coding":43531,"pseudogene":6174,"total":59559}},"status":"Full annotation"},"assembly_info":{"assembly_level":"Chromosome","assembly_method":"wtdbg v. 2; quickmerge v. 1","assembly_name":"ASM1834038v1","assembly_status":"current","assembly_type":"haploid","bioproject_accession":"PRJNA682709","bioproject_lineage":[{"bioprojects":[{"accession":"PRJNA682709","title":"Cyprinus carpio isolate:SPL01 Genome sequencing and assembly"}]}],"biosample":{"accession":"SAMN17005855","attributes":[{"name":"isolate","value":"SPL01"},{"name":"dev_stage","value":"adult"},{"name":"sex","value":"not collected"},{"name":"tissue","value":"muscle"}],"bioprojects":[{"accession":"PRJNA682709"}],"description":{"organism":{"organism_name":"Cyprinus carpio","tax_id":7962},"title":"Model organism or animal sample from Cyprinus carpio"},"last_updated":"2021-05-13T06:39:37.300","models":["Model organism or animal"],"owner":{"contacts":[{}],"name":"Chinese Academy of Fishery Sciences"},"package":"Model.organism.animal.1.0","publication_date":"2021-05-13T06:39:37.300","sample_ids":[{"label":"Sample name","value":"Common_carp"}],"status":{"status":"live","when":"2021-05-13T06:39:37.300"},"submission_date":"2020-12-05T07:09:04.477"},"blast_url":"https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch\u0026PROG_DEF=blastn\u0026BLAST_SPEC=GDH_GCF_018340385.1","paired_assembly":{"accession":"GCA_018340385.1","only_genbank":"7 unlocalized scaffolds on chromosome MT","only_refseq":"chromosome MT","status":"current"},"refseq_category":"representative genome","release_date":"2021-05-12","sequencing_tech":"PacBio; Oxford Nanopore; Illumina HiSeq","submitter":"Chinese Academy of Fishery Sciences"},"assembly_stats":{"contig_l50":229,"contig_n50":1558716,"gc_count":"620441384","gc_percent":37,"genome_coverage":"184.8x","number_of_component_sequences":6700,"number_of_contigs":19837,"number_of_organelles":1,"number_of_scaffolds":6700,"scaffold_l50":24,"scaffold_n50":29545497,"total_number_of_chromosomes":50,"total_sequence_length":"1680118328","total_ungapped_length":"1672146419"},"current_accession":"GCF_018340385.1","organelle_info":[{"description":"Mitochondrion","submitter":"Chinese Academy of Fishery Sciences","total_seq_length":"16575"}],"organism":{"common_name":"common carp","infraspecific_names":{"isolate":"SPL01"},"organism_name":"Cyprinus carpio","tax_id":7962},"paired_accession":"GCA_018340385.1","source_database":"SOURCE_DATABASE_REFSEQ","wgs_info":{"master_wgs_url":"https://www.ncbi.nlm.nih.gov/nuccore/JAEOAB000000000.1","wgs_contigs_url":"https://www.ncbi.nlm.nih.gov/Traces/wgs/JAEOAB01","wgs_project_accession":"JAEOAB01"}},{"accession":"GCA_018340385.1","assembly_info":{"assembly_level":"Chromosome","assembly_method":"wtdbg v. 2; quickmerge v. 1","assembly_name":"ASM1834038v1","assembly_status":"current","assembly_type":"haploid","bioproject_accession":"PRJNA682709","bioproject_lineage":[{"bioprojects":[{"accession":"PRJNA682709","title":"Cyprinus carpio isolate:SPL01 Genome sequencing and assembly"}]}],"biosample":{"accession":"SAMN17005855","attributes":[{"name":"isolate","value":"SPL01"},{"name":"dev_stage","value":"adult"},{"name":"sex","value":"not collected"},{"name":"tissue","value":"muscle"}],"bioprojects":[{"accession":"PRJNA682709"}],"description":{"organism":{"organism_name":"Cyprinus carpio","tax_id":7962},"title":"Model organism or animal sample from Cyprinus carpio"},"last_updated":"2021-05-13T06:39:37.300","models":["Model organism or animal"],"owner":{"contacts":[{}],"name":"Chinese Academy of Fishery Sciences"},"package":"Model.organism.animal.1.0","publication_date":"2021-05-13T06:39:37.300","sample_ids":[{"label":"Sample name","value":"Common_carp"}],"status":{"status":"live","when":"2021-05-13T06:39:37.300"},"submission_date":"2020-12-05T07:09:04.477"},"blast_url":"https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch\u0026PROG_DEF=blastn\u0026BLAST_SPEC=GDH_GCA_018340385.1","paired_assembly":{"accession":"GCF_018340385.1","annotation_name":"NCBI Cyprinus carpio Annotation Release 101","only_genbank":"7 unlocalized scaffolds on chromosome MT","only_refseq":"chromosome MT","status":"current"},"release_date":"2021-05-12","sequencing_tech":"PacBio; Oxford Nanopore; Illumina HiSeq","submitter":"Chinese Academy of Fishery Sciences"},"assembly_stats":{"contig_l50":229,"contig_n50":1558716,"gc_count":"620441384","gc_percent":37,"genome_coverage":"184.8x","number_of_component_sequences":6700,"number_of_contigs":19837,"number_of_organelles":1,"number_of_scaffolds":6700,"scaffold_l50":24,"scaffold_n50":29545497,"total_number_of_chromosomes":50,"total_sequence_length":"1680118328","total_ungapped_length":"1672146419"},"current_accession":"GCA_018340385.1","organelle_info":[{"description":"Mitochondrion","submitter":"Chinese Academy of Fishery Sciences"}],"organism":{"common_name":"common carp","infraspecific_names":{"isolate":"SPL01"},"organism_name":"Cyprinus carpio","tax_id":7962},"paired_accession":"GCF_018340385.1","source_database":"SOURCE_DATABASE_GENBANK","wgs_info":{"master_wgs_url":"https://www.ncbi.nlm.nih.gov/nuccore/JAEOAB000000000.1","wgs_contigs_url":"https://www.ncbi.nlm.nih.gov/Traces/wgs/JAEOAB01","wgs_project_accession":"JAEOAB01"}}],"total_count": 2}

Describe the solution you'd like It would be nice if there were a meaningful "publication" field that was the publication that should be cited when that data source is used!

Thank you-

conchoecia avatar May 24 '24 23:05 conchoecia

Hi conchoecia,

Thank you for opening this issue. While we won't be able to get to it any time soon, please know that we are exploring better ways to enhance data attribution. We'll keep this issue open until the issue is resolved.

Nuala

Nuala A. O'Leary, PhD Product Owner, NCBI Datasets National Center for Biotechnology Information, NLM, NIH, DHHS

olearyna avatar May 28 '24 12:05 olearyna

Thanks for the response, @olearyna! Good luck with the development

conchoecia avatar May 29 '24 09:05 conchoecia