David Koslicki

Results 228 comments of David Koslicki

@IsaacT1123 a friendly reminder to reference this issue in your commits! It's a great way to keep track of changes being made, and is also a visible history of your...

Still in the process of diagnosing the issue. Will need to return to it later @dkoslicki

Definitely an issue with the `kmc_dump` and then the `intersection`: with ```bash /usr/bin/time python ${scriptsDir}/MakeStreamingDNADatabase.py filenames.txt TrainingDatabase.h5 -k 10 /usr/bin/time python ${scriptsDir}/StreamingQueryDNADatabase.py ${testOrganism} TrainingDatabase.h5 results.csv 10-10-1 --sensitive --intersect -c 0...

Indeed, it might be the `kmc_tools simple intersect` that's the problem since: ``` comm -1 -2

@IsaacT1123 So the `-fa` to `-fm` in `Intersect.count_training_kmers()` fixed one issue, so now tests work if you use a single k-mer size, but still don't pass if you use multiple...

@IsaacT1123 Finally figured out the issue, and it's "obvious" now that I see what happens: When intersecting the reads 21-mers with the training database 21-mers, you miss out on some...

@luizirber > > i.e. The KMC prefilter **at this time should only be used when the training database is constructed with a k-mer size of K and the `StreamingQueryDNADatabase.py` is...

@luizirber I would be interested to see when non-“CMash paper” is underestimating badly (as in, the worst case of 0.51 when it should be 0.995). Maybe you could send me...

@x-zang This is one of the things we’ll be discussing during our meeting tomorrow and would be a good first project during your rotation.

Do you mind providing a bit more information? For example: 1. Were any error/warning messages output? 2. Do any temporary files get created (eg. `/output_folder/input_file.fasta-y30.txt`)? 3. Have you tried it...