Add publication info to assembly or BioProject fields.
Is your feature request related to a problem? Please describe.
I would like to be able to quickly determine what research article I should reference simply from looking at the output of datasets for a particular genome assembly or BioProject. This publication information is often on the NCBI website, but when I programmatically access the same accession/BioProject, there is no reference to the publication.
For example, on the BioProject page for the common carp, there is a paper to cite: Chang YS et al., "The complete nucleotide sequence and gene organization of carp (Cyprinus carpio) mitochondrial genome.", J Mol Evol, 1994 Feb;38(2):138-55
However, when I query the same BioProject on the command line there is no publication information: datasets summary genome accession PRJNA682709
{"reports": [{"accession":"GCF_018340385.1","annotation_info":{"busco":{"busco_lineage":"actinopterygii_odb10","busco_ver":"4.1.4","complete":0.98571426,"duplicated":0.62445056,"fragmented":0.0054945056,"missing":0.008791209,"single_copy":0.36126372,"total_count":"3640"},"method":"Best-placed RefSeq; Gnomon","name":"NCBI Cyprinus carpio Annotation Release 101","pipeline":"NCBI eukaryotic genome annotation pipeline","provider":"NCBI RefSeq","release_date":"2021-07-20","report_url":"https://www.ncbi.nlm.nih.gov/genome/annotation_euk/Cyprinus_carpio/101","software_version":"9.0","stats":{"gene_counts":{"non_coding":9553,"other":301,"protein_coding":43531,"pseudogene":6174,"total":59559}},"status":"Full annotation"},"assembly_info":{"assembly_level":"Chromosome","assembly_method":"wtdbg v. 2; quickmerge v. 1","assembly_name":"ASM1834038v1","assembly_status":"current","assembly_type":"haploid","bioproject_accession":"PRJNA682709","bioproject_lineage":[{"bioprojects":[{"accession":"PRJNA682709","title":"Cyprinus carpio isolate:SPL01 Genome sequencing and assembly"}]}],"biosample":{"accession":"SAMN17005855","attributes":[{"name":"isolate","value":"SPL01"},{"name":"dev_stage","value":"adult"},{"name":"sex","value":"not collected"},{"name":"tissue","value":"muscle"}],"bioprojects":[{"accession":"PRJNA682709"}],"description":{"organism":{"organism_name":"Cyprinus carpio","tax_id":7962},"title":"Model organism or animal sample from Cyprinus carpio"},"last_updated":"2021-05-13T06:39:37.300","models":["Model organism or animal"],"owner":{"contacts":[{}],"name":"Chinese Academy of Fishery Sciences"},"package":"Model.organism.animal.1.0","publication_date":"2021-05-13T06:39:37.300","sample_ids":[{"label":"Sample name","value":"Common_carp"}],"status":{"status":"live","when":"2021-05-13T06:39:37.300"},"submission_date":"2020-12-05T07:09:04.477"},"blast_url":"https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch\u0026PROG_DEF=blastn\u0026BLAST_SPEC=GDH_GCF_018340385.1","paired_assembly":{"accession":"GCA_018340385.1","only_genbank":"7 unlocalized scaffolds on chromosome MT","only_refseq":"chromosome MT","status":"current"},"refseq_category":"representative genome","release_date":"2021-05-12","sequencing_tech":"PacBio; Oxford Nanopore; Illumina HiSeq","submitter":"Chinese Academy of Fishery Sciences"},"assembly_stats":{"contig_l50":229,"contig_n50":1558716,"gc_count":"620441384","gc_percent":37,"genome_coverage":"184.8x","number_of_component_sequences":6700,"number_of_contigs":19837,"number_of_organelles":1,"number_of_scaffolds":6700,"scaffold_l50":24,"scaffold_n50":29545497,"total_number_of_chromosomes":50,"total_sequence_length":"1680118328","total_ungapped_length":"1672146419"},"current_accession":"GCF_018340385.1","organelle_info":[{"description":"Mitochondrion","submitter":"Chinese Academy of Fishery Sciences","total_seq_length":"16575"}],"organism":{"common_name":"common carp","infraspecific_names":{"isolate":"SPL01"},"organism_name":"Cyprinus carpio","tax_id":7962},"paired_accession":"GCA_018340385.1","source_database":"SOURCE_DATABASE_REFSEQ","wgs_info":{"master_wgs_url":"https://www.ncbi.nlm.nih.gov/nuccore/JAEOAB000000000.1","wgs_contigs_url":"https://www.ncbi.nlm.nih.gov/Traces/wgs/JAEOAB01","wgs_project_accession":"JAEOAB01"}},{"accession":"GCA_018340385.1","assembly_info":{"assembly_level":"Chromosome","assembly_method":"wtdbg v. 2; quickmerge v. 1","assembly_name":"ASM1834038v1","assembly_status":"current","assembly_type":"haploid","bioproject_accession":"PRJNA682709","bioproject_lineage":[{"bioprojects":[{"accession":"PRJNA682709","title":"Cyprinus carpio isolate:SPL01 Genome sequencing and assembly"}]}],"biosample":{"accession":"SAMN17005855","attributes":[{"name":"isolate","value":"SPL01"},{"name":"dev_stage","value":"adult"},{"name":"sex","value":"not collected"},{"name":"tissue","value":"muscle"}],"bioprojects":[{"accession":"PRJNA682709"}],"description":{"organism":{"organism_name":"Cyprinus carpio","tax_id":7962},"title":"Model organism or animal sample from Cyprinus carpio"},"last_updated":"2021-05-13T06:39:37.300","models":["Model organism or animal"],"owner":{"contacts":[{}],"name":"Chinese Academy of Fishery Sciences"},"package":"Model.organism.animal.1.0","publication_date":"2021-05-13T06:39:37.300","sample_ids":[{"label":"Sample name","value":"Common_carp"}],"status":{"status":"live","when":"2021-05-13T06:39:37.300"},"submission_date":"2020-12-05T07:09:04.477"},"blast_url":"https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE_TYPE=BlastSearch\u0026PROG_DEF=blastn\u0026BLAST_SPEC=GDH_GCA_018340385.1","paired_assembly":{"accession":"GCF_018340385.1","annotation_name":"NCBI Cyprinus carpio Annotation Release 101","only_genbank":"7 unlocalized scaffolds on chromosome MT","only_refseq":"chromosome MT","status":"current"},"release_date":"2021-05-12","sequencing_tech":"PacBio; Oxford Nanopore; Illumina HiSeq","submitter":"Chinese Academy of Fishery Sciences"},"assembly_stats":{"contig_l50":229,"contig_n50":1558716,"gc_count":"620441384","gc_percent":37,"genome_coverage":"184.8x","number_of_component_sequences":6700,"number_of_contigs":19837,"number_of_organelles":1,"number_of_scaffolds":6700,"scaffold_l50":24,"scaffold_n50":29545497,"total_number_of_chromosomes":50,"total_sequence_length":"1680118328","total_ungapped_length":"1672146419"},"current_accession":"GCA_018340385.1","organelle_info":[{"description":"Mitochondrion","submitter":"Chinese Academy of Fishery Sciences"}],"organism":{"common_name":"common carp","infraspecific_names":{"isolate":"SPL01"},"organism_name":"Cyprinus carpio","tax_id":7962},"paired_accession":"GCF_018340385.1","source_database":"SOURCE_DATABASE_GENBANK","wgs_info":{"master_wgs_url":"https://www.ncbi.nlm.nih.gov/nuccore/JAEOAB000000000.1","wgs_contigs_url":"https://www.ncbi.nlm.nih.gov/Traces/wgs/JAEOAB01","wgs_project_accession":"JAEOAB01"}}],"total_count": 2}
Describe the solution you'd like It would be nice if there were a meaningful "publication" field that was the publication that should be cited when that data source is used!
Thank you-
Hi conchoecia,
Thank you for opening this issue. While we won't be able to get to it any time soon, please know that we are exploring better ways to enhance data attribution. We'll keep this issue open until the issue is resolved.
Nuala
Nuala A. O'Leary, PhD Product Owner, NCBI Datasets National Center for Biotechnology Information, NLM, NIH, DHHS
Thanks for the response, @olearyna! Good luck with the development