bgcflow
bgcflow copied to clipboard
Add sketch size parameter to MASH
Some of the datasets I worked with were very diverse. Sketch sampling size is a key parameter for diverse dataset and accurate predictions.
I increase the sketch size with option -s
from 1000 to 10000 and got much better results. This sample size is also used in the paper https://www.nature.com/articles/s42003-020-01626-5#Sec11.
I recommend increasing the sketch size to 10 k in the default mash run.