strobealign
strobealign copied to clipboard
Run accuracy tests in CI
The current test dataset consists of the first 100'000 reads of SRR6055476. As it is real sequencing data, we do not have the truth available and thus cannot assess accuracy. We can only see whether something changed, but it would be very helpful to also be able to see whether the change made the result more or less accurate.
The suggestion is to test strobealign on simulated reads in CI.
My initial reason for using real data is that it is very easy to download it from the SRA within a GitHub workflow. For simulated reads, we need to either 1) host the data somewhere or 2) create simulated reads within the workflow.
It would be great for reproducibility if we could make option 2 work. The simulation would only happen the first time the workflow runs and then the data would be put into the GitHub cache, so this would be just as fast as the second option on subsequent runs.