ncov-ingest icon indicating copy to clipboard operation
ncov-ingest copied to clipboard

A pipeline that ingests SARS-CoV-2 (i.e. nCoV) data from GISAID and Genbank, transforms it, stores it on S3, and triggers Nextstrain nCoV rebuilds.

Results 32 ncov-ingest issues
Sort by recently updated
recently updated
newest added

Nextclade now doesn't error anymore when input sequence file empty See https://github.com/nextstrain/nextclade/issues/1422 ## Checklist - [ ] Checks pass: - [ ] https://github.com/nextstrain/ncov-ingest/actions/runs/8157039438/job/22295811535 - [ ] https://github.com/nextstrain/ncov-ingest/actions/runs/8157034823/job/22295796736

Currently, dependencies are defined rather loosely: https://github.com/nextstrain/ncov-ingest/blob/e6c904f1e36c6f833ee0b1b5b52b0aaa8570b6f8/requirements.txt#L1-L8 From https://github.com/nextstrain/ncov-ingest/pull/416#discussion_r1298885003: > Ideally, it would still be nice to lock these deps so they're not changing and liable to break on new...

enhancement

[VirusSeq](https://virusseq-dataportal.ca/) is a Canadian data portal that hosts over 500,000 SARS-CoV-2 sequences that are not on GenBank. It would be amazing if they could be incorporated into the ncov open...

enhancement

The `fetch-from-biosample` script was added before we fully transitioned to using the NCBI Datasets CLI. Today I noticed in the NCBI Datasets CLI docs for the [datasets download virus genome...

enhancement

### Context It's possible to get the purpose of sampling from genbank via `datasets summary virus genome taxon sars-cov-2`, see https://github.com/GenSpectrum/LAPIS/issues/328#issuecomment-1673639689 It would be nice if we parsed that field...

enhancement

When we add new clades, we need to add them to `clade-legacy-mapping.yml` in this repo. This is easy to forget as all the other changes are in `ncov` directly. If...

Most RKI sequences lack submission date since move to the new format. I've opened an issue in the RKI repo to let them know that most submission dates have gone...

### Context [Requested](https://bedfordlab.slack.com/archives/CTZKJC7PZ/p1687865743308309) by @corneliusroemer as a a convenient way to inspect the build logs when the build fails on AWS batch. ### Possible solution According to the [Snakemake docs](https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#onstart-onsuccess-and-onerror-handlers),...

enhancement

### Context See "Future work" item in https://github.com/nextstrain/ncov-ingest/pull/405 ### Description Add a reasonable default for clade_legacy so that new clade doesn't require changes in ingest. E.g., we could use the...

enhancement

Most Chinese sequences are now open, this should be a useful addition to open data Usher now pulls these, it seems to work. We can probably copy the approach, see...

enhancement