serratus icon indicating copy to clipboard operation
serratus copied to clipboard

Update benchmarking script with new CLI

Open victorlin opened this issue 4 years ago • 7 comments

Opening this PR to track progress for #27. Keeping as draft until ready.

TODO:

  • split into 2 scripts
    • cov_benchmark.py
      • [x] design high-level CLI
      • [x] implement simulation parameters
      • [x] implement alignment parameters
      • [x] implement custom subprocess parameters (msbar,art_illumina, bowtie2)
      • [x] convert counts to proportions
    • pangenome_cov_benchmark.py
      • [x] per genome in pangenome: generate single-record FASTA, run cov_benchmark.py with predefined parameters
      • [x] aggregate results in .csv file
  • [x] update README
  • [ ] write containerized unit tests (?)

victorlin avatar Apr 25 '20 00:04 victorlin

PR closes #27

mathemage avatar Apr 26 '20 13:04 mathemage

I was able to do most of the finalizing work for this today. A few more unit tests that I'm figuring out how to implement since it doesn't seem like msbar supports a random seed to generate reproducible output.

The updated README is mostly technical without much insight on the biological context; @ababaian if you have any comments/suggestions feel free to modify directly or let me know!

victorlin avatar May 03 '20 21:05 victorlin

Hey @victorlin, can you remove the fq files from the repo and instead host them on the s3 bucket. There is a test_data folder which these would fit in.

--

I'm a bit tied up at the moment, I'll do a proper review tomorrow

ababaian avatar May 03 '20 23:05 ababaian

@ababaian done - moved all the test input files to s3://serratus-public/test-data/benchmarker/. Also added a setup.sh script to copy the files into test/benchmarker/local/. Using local for the folder name since it's targeted by our repo's .gitignore.

victorlin avatar May 04 '20 05:05 victorlin

A lot has happened since this was started. Is the benchmarker still needed?

The last task is unit testing msbar, but it seems non-trivial. We could merge this PR in and add a separate task, or just close out to make room for other exciting things.

victorlin avatar May 24 '20 17:05 victorlin

IMO we don't need a mapping benchmark for the main Serratus search, we have settled on how to run bowtie2 and are not likely to revisit benchmarking of mappers.

rcedgar avatar May 24 '20 17:05 rcedgar

Ironically enough I do think we /should/ revisit this as there will be ways to improve on --very-sensitive-local I just think on the totem pole of priorites this has gotten bumped down quite a bit as more pressing things always seem to be there. I'd say let's leave this open and swing back to it when time is a luxury we have.

ababaian avatar May 24 '20 18:05 ababaian