ska.rust icon indicating copy to clipboard operation
ska.rust copied to clipboard

different results from ska2 0.3.2 and 0.3.6

Open danrlu opened this issue 11 months ago • 2 comments

I was trying to run the latest version 0.3.6 (from conda install -c bioconda ska2 today) with our data and the results are different from 0.3.2. The command for all the following analysis are the same:

ska build -o seqs_ska2_strict --min-count 4 --min-qual 20 --threads 4 -k 31 --qual-filter strict -f ska2_input.tsv
ska distance --filter-ambiguous seqs_ska2_strict.skf > distances_ska2_strict.txt

We have 40 samples: 0.3.2: image

0.3.6: image

To help debug, I took 5 samples from the cluster in bottom right corner and subsampled to 1/5 of the read counts so the file is smaller. You can find them here https://github.com/danrlu/debug_data/tree/main/ska:

With 5 samples (see ska2_input_more.tsv) 0.3.2:

Sample1	Sample2	Distance	Mismatches
pt59	pt60	6.00	0.19603
pt59	pt61	5.00	0.21242
pt59	pt74	7.00	0.21048
pt59	pt75	7.00	0.19414
pt60	pt61	7.00	0.21246
pt60	pt74	7.00	0.21003
pt60	pt75	7.00	0.19471
pt61	pt74	2.00	0.22424
pt61	pt75	6.00	0.21081
pt74	pt75	7.00	0.21033

and 0.3.6

Sample1	Sample2	Distance	Mismatches
pt59	pt60	20.00	0.19603
pt59	pt61	21.17	0.21242
pt59	pt74	23.17	0.21048
pt59	pt75	24.00	0.19414
pt60	pt61	21.50	0.21246
pt60	pt74	20.00	0.21003
pt60	pt75	19.50	0.19471
pt61	pt74	16.17	0.22424
pt61	pt75	20.67	0.21081
pt74	pt75	19.67	0.21033

With 2 samples (see ska2_input.tsv), both 0.3.2 and 0.3.6 gave the same results:

Sample1	Sample2	Distance	Mismatches
pt60	pt61	12.00	0.21246

I checked the documentation and didn't see changes of setting for the options in the command. Let me know what else I should try~~ Thanks!!

danrlu avatar Mar 07 '24 23:03 danrlu