feat: BV-BRC support
Context
ViPR is being replaced with BV-BRC with final transfer possibly occurring by the end of the year (2022-12-31) which might disrupt pathogen builds.
Description
For any pathogen builds depending on ViPR (such as zika), we should make sure we can do one (or a combination) of the following:
- pull the equivalent
GenomicFastaResults.fasta - modify any required scripts for BV-BRC datasets
Possible solution
So far the user interface seems pretty equivalent:
However we may need to modify the fasta headers:
Notice how the header field is different from ViPR and not modifiable like ViPR. (example from dengue)
head BVBRC_genome_sequence.fasta
>accn|KY829115 Dengue virus 1 isolate H.sapiens-wt/BLM/2016/MA-WGS16-006-SER, complete genome. [Dengue virus 1 H.sapiens-wt/BLM/2016/MA-WGS16-006-SER strain Dengue virus 1/H.sapiens-wt/BLM/2016/MA-WGS16-006-SER | 11053.9479]
agttgttagtctacgtggaccgacaagaacagtttcgaatcggaagcttgcttaacgtag
ttctaacagttttttattagagagcagatctctgatgaacaaccaacggaaaaagacggg
tcgaccgtctttcaatatgctgaaacgcgcgagaaaccgcgtgtcaactggttcacagtt
Instead we may need to download and process the tabular data to match our current GenomicFastaResults.fasta headers:
example from Zika
less GenomicFastaResults.fasta
>KY241742|ZIKV_SG_072|NA|2016_08_28|Human|Singapore|Asian|Zika_virus
GAATCAGACTGCGACAGTTCGAGTTTGAAGCGAAAGCTAGCAACAGTATCAACAGGTTTTATTTTGGATT
TGGAAACGAGAGTTTCTGGTCATGAAAAACCCAAAAAAGAAATCCGGAGGATTCCGGATTGTCAATATGC
TAAAACGCGGAGTAGCCCGTGTGAGCCCCTTTGGGGGCTTGAAGAGGCTGCCAGCCGGACTTCTGCTGGG
TCATGGGCCCATCAGGATGGTCTTGGCGATTCTAGCCTTTTTGAGGTTCACGGCAATCAAGCCATCACTG
Alternatively, someone could do a deep dive into BV-BRC cli
- https://www.bv-brc.org/docs/cli_tutorial/index.html
Just a note that ViPR website redirects to BV-BRC. Ergo any ViPR instructions may be out of date
Closing this, since instead of adding BV-BRC support, we've gone in the direction of using Entrez or NCBI datasets in the pathogen repo-guide.
- https://github.com/nextstrain/pathogen-repo-guide/blob/89b3c5dbc9b8f6a009f4a19c3ac56113bc5511ee/ingest/rules/fetch_from_ncbi.smk#L16-L26