FASTK icon indicating copy to clipboard operation
FASTK copied to clipboard

Merged libraries do not show lower-count k-mers

Open bnavarrodominguez opened this issue 2 months ago • 1 comments

Dear Gene,

I have a large sequencing library that I needed to split into 10 smaller files so I could run FastK on different nodes. Following the instructions in the README, I ran FastK on the split files with the following command:

for file in library_split_*; do mkdir tmp.${file}; FastK -v -t5 -k31 -M50 -T24 -Ptmp.${file} $file; done

This produced a *.hist and a *.ktab file for each *.split.fastq file. I looked at the k-mer count histogram for each split file:

Histex -G library_split_01.hist > library_split_01.histogram
$ head library_split_01.histogram
1       6062202409
2       3370987439
3       1728287765
4       894614808
5       482057568

I then merged the split files using Fastmerge, and generated histograms for the merged k-mer database:

Fastmerge -T12 -t -h library_fastmerged library_split_*ktab
Histex -G library_fastmerged.hist > library_fastmerged.histogram
$ head library_fastmerged.histogram

4       2032698049
5       522131235
6       134785342
7       33514609
8       420971175

I noticed that there are no k-mers with a count lower than 4 in the merged library histogram. I repeated the process a few times, combining different files, and the merged histograms consistently lack smaller k-mer counts (i.e., they start at 4 or 5). I’m unsure if this behavior is expected, as I do not understand why there are no single-occurrence k-mers. Is this a bug, or am I misunderstanding or misusing the tool?

Thanks for your assistance!

bnavarrodominguez avatar Dec 18 '24 16:12 bnavarrodominguez