charcoal What should we benchmark?

What should we benchmark?

Open ccbaumler opened this issue 2 years ago • 3 comments

Following the same process as sourmash-bio/sourmash#2410, we will benchmark the charcoal workflow with the demo directory and/or the six signatures included in sourmash-bio/sourmash#2410. Suggesting to:

Run the demo repo
Run each sequences alone
Run a variety of sequences from small to large sets
Run the all six together

It may be interesting to also compare the results of sourmash search --containment to `charcoal.contigs_list_contaminents.py in this repo.

Dec 20 '22 20:12 ccbaumler

It would also be interesting to compare the accuracy of sourmash gather and genome-grist MinSetCov taxonomic outputs with and without charcoal.

Dec 20 '22 21:12 ccbaumler

It sounds like you might be trying to benchmark both computational performance and classification performance. Those are pretty different things.

I don't think that charcoal has any individually expensive steps or computationally complex scripts that are part of it; it's just the workflow overall that involves an awful lot of steps, much like genome-grist. That may change your benchmarking strategy.

Dec 21 '22 13:12 ctb

I agree! They are completely different benchmarks. Mostly wanted to jot down the notion before it left me forever. Additionally, since we will be writing an analytical benchmark for computational results, we will have a foundation to come back to in the future when we are ready for a biological benchmark.

Would you suggest forking and adding benchmark directives throughout the snakefile instead of a global benchmark?

Dec 21 '22 16:12 ccbaumler

charcoal charcoal copied to clipboard

What should we benchmark?

charcoal
charcoal copied to clipboard