FASTK copied to clipboard
Merged libraries do not show lower-count k-mers
Dear Gene,
I have a large sequencing library that I needed to split into 10 smaller files so I could run FastK
on different nodes. Following the instructions in the README, I ran FastK
on the split files with the following command:
for file in library_split_*; do mkdir tmp.${file}; FastK -v -t5 -k31 -M50 -T24 -Ptmp.${file} $file; done
This produced a *.hist
and a *.ktab
file for each *.split.fastq
file. I looked at the k-mer count histogram for each split file:
Histex -G library_split_01.hist > library_split_01.histogram
$ head library_split_01.histogram
1 6062202409
2 3370987439
3 1728287765
4 894614808
5 482057568
I then merged the split files using Fastmerge
, and generated histograms for the merged k-mer database:
Fastmerge -T12 -t -h library_fastmerged library_split_*ktab
Histex -G library_fastmerged.hist > library_fastmerged.histogram
$ head library_fastmerged.histogram
4 2032698049
5 522131235
6 134785342
7 33514609
8 420971175
I noticed that there are no k-mers with a count lower than 4 in the merged library histogram. I repeated the process a few times, combining different files, and the merged histograms consistently lack smaller k-mer counts (i.e., they start at 4 or 5). I’m unsure if this behavior is expected, as I do not understand why there are no single-occurrence k-mers. Is this a bug, or am I misunderstanding or misusing the tool?
Thanks for your assistance!