ska.rust
ska.rust copied to clipboard
different results from ska2 0.3.2 and 0.3.6
I was trying to run the latest version 0.3.6 (from conda install -c bioconda ska2
today) with our data and the results are different from 0.3.2. The command for all the following analysis are the same:
ska build -o seqs_ska2_strict --min-count 4 --min-qual 20 --threads 4 -k 31 --qual-filter strict -f ska2_input.tsv
ska distance --filter-ambiguous seqs_ska2_strict.skf > distances_ska2_strict.txt
We have 40 samples:
0.3.2:
0.3.6:
To help debug, I took 5 samples from the cluster in bottom right corner and subsampled to 1/5 of the read counts so the file is smaller. You can find them here https://github.com/danrlu/debug_data/tree/main/ska:
With 5 samples (see ska2_input_more.tsv
)
0.3.2:
Sample1 Sample2 Distance Mismatches
pt59 pt60 6.00 0.19603
pt59 pt61 5.00 0.21242
pt59 pt74 7.00 0.21048
pt59 pt75 7.00 0.19414
pt60 pt61 7.00 0.21246
pt60 pt74 7.00 0.21003
pt60 pt75 7.00 0.19471
pt61 pt74 2.00 0.22424
pt61 pt75 6.00 0.21081
pt74 pt75 7.00 0.21033
and 0.3.6
Sample1 Sample2 Distance Mismatches
pt59 pt60 20.00 0.19603
pt59 pt61 21.17 0.21242
pt59 pt74 23.17 0.21048
pt59 pt75 24.00 0.19414
pt60 pt61 21.50 0.21246
pt60 pt74 20.00 0.21003
pt60 pt75 19.50 0.19471
pt61 pt74 16.17 0.22424
pt61 pt75 20.67 0.21081
pt74 pt75 19.67 0.21033
With 2 samples (see ska2_input.tsv
), both 0.3.2 and 0.3.6 gave the same results:
Sample1 Sample2 Distance Mismatches
pt60 pt61 12.00 0.21246
I checked the documentation and didn't see changes of setting for the options in the command. Let me know what else I should try~~ Thanks!!