varCA
varCA copied to clipboard
a smaller test dataset
Our current test dataset comprises all of chr1 in two different samples: the Jurkat sample and the MOLT4 cell line. It takes about an hour to run the entire pipeline with this dataset.
Ideally, we would have a dataset that runs in under 10 mins or so. This could then be incorporated into a Github CI pipeline that runs automatically upon release of each major and minor version increment, so that we can know when a change that we've made to the code leads to a change in the results.
- [x] find SNVs and indels supported by all callers
- [x] choose just one or two peaks that overlap those variants from each of the two samples
- [x] subset the example dataset to reads that only overlap those peaks
- [x] also try to subset the reference genome that is packaged with the example data, since the ref genome appears to be the largest file, right now
- [x] rerun the pipeline with the smaller dataset and tweak the dataset as necessary to make it run quickly
- [ ] use
snakemake --generate-unit-tests
to create a bunch of tests that can be executed usingpytest
- I'm running into issues with this. It doesn't work for outputs marked as
pipe
and there are some problems with other directories (see snakemake/snakemake#1104) - [ ] fix issues and ensure test coverage is appropriate
- [ ] remove any unnecessary tests to ensure the test directory is small and can be properly included in version history (edit: this won't be possible, after all - b/c the test directory has to include the outputs of each rule ugh)
- I'm running into issues with this. It doesn't work for outputs marked as
- [ ] (optionally) create a Github action like this one to execute
pytest
upon each major or minor version increment and confirm the tests pass successfully