fauna icon indicating copy to clipboard operation
fauna copied to clipboard

feat: BV-BRC support

Open j23414 opened this issue 3 years ago • 1 comments

Context

ViPR is being replaced with BV-BRC with final transfer possibly occurring by the end of the year (2022-12-31) which might disrupt pathogen builds.

BV-BRC_screenshot

Description

For any pathogen builds depending on ViPR (such as zika), we should make sure we can do one (or a combination) of the following:

  • pull the equivalent GenomicFastaResults.fasta
  • modify any required scripts for BV-BRC datasets

Possible solution

So far the user interface seems pretty equivalent:

BV-BRC_zika BV-BRC_zika_download

However we may need to modify the fasta headers:

Notice how the header field is different from ViPR and not modifiable like ViPR. (example from dengue)

head BVBRC_genome_sequence.fasta
>accn|KY829115   Dengue virus 1 isolate H.sapiens-wt/BLM/2016/MA-WGS16-006-SER, complete genome.   [Dengue virus 1 H.sapiens-wt/BLM/2016/MA-WGS16-006-SER strain Dengue virus 1/H.sapiens-wt/BLM/2016/MA-WGS16-006-SER | 11053.9479]
agttgttagtctacgtggaccgacaagaacagtttcgaatcggaagcttgcttaacgtag
ttctaacagttttttattagagagcagatctctgatgaacaaccaacggaaaaagacggg
tcgaccgtctttcaatatgctgaaacgcgcgagaaaccgcgtgtcaactggttcacagtt

Instead we may need to download and process the tabular data to match our current GenomicFastaResults.fasta headers:

example from Zika

less GenomicFastaResults.fasta
>KY241742|ZIKV_SG_072|NA|2016_08_28|Human|Singapore|Asian|Zika_virus
GAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAGGTTTTATTTTGGATT
TGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAAAAGAAATCCGGAGGATTCCGGATTGTCAATATGC
TAAAACGCGGAGTAGCCCGTGTGAGCCCCTTTGGGGGCTTGAAGAGGCTGCCAGCCGGACTTCTGCTGGG
TCATGGGCCCATCAGGATGGTCTTGGCGATTCTAGCCTTTTTGAGGTTCACGGCAATCAAGCCATCACTG

Alternatively, someone could do a deep dive into BV-BRC cli

  • https://www.bv-brc.org/docs/cli_tutorial/index.html

j23414 avatar Oct 14 '22 19:10 j23414

Just a note that ViPR website redirects to BV-BRC. Ergo any ViPR instructions may be out of date

j23414 avatar Feb 27 '23 18:02 j23414

Closing this, since instead of adding BV-BRC support, we've gone in the direction of using Entrez or NCBI datasets in the pathogen repo-guide.

  • https://github.com/nextstrain/pathogen-repo-guide/blob/89b3c5dbc9b8f6a009f4a19c3ac56113bc5511ee/ingest/rules/fetch_from_ncbi.smk#L16-L26

j23414 avatar Jul 23 '24 14:07 j23414