Modkit motif search
Hi,
I'm using modkit pileup and modkit motif search with bacterial samples (DNA). I'm trying to detect m6A modifications and known motifs for my species.
The problem I'm facing is that "modkit motif search" identifies the correct motifs, but the number of motifs found is way higher than the real number of motifs present in my genome.
For example, modkit motif identifies more than 5700 motifs, or this motif "CRAAAAR" is only present 3200 times in my genome. How can I explain these results ?
PS: I used the raw files from modkit pileup without any filtering.
Thanks, Rania
Hello @rania-o,
Could you tell me which basecalling/modification models you used? If you have not filtered the base modification calls (used --no-filtering with modkit pileup) I would not be surprised if you get a lot of false positive motifs. The motif search function in Modkit doesn't have any "filtering" to remove potentially false positive base modification calls.
I would try performing pileup without --no-filtering.
Hello @ArtRand
Thanks for your reply.
I used the sup model basecalling "[email protected]", and this is the command line :
dorado basecaller [email protected] $s4_rep3 --modified-bases 6mA > sample_sup_m6a.bam
And this is the one I used for modkit pileup:
modkit pileup $input $out --log-filepath $out/pileup_m6alog --with-header
For motif search I used the raw pileup file, but to filter my m6A results, I filtered out positions with a coverage less than 30 and % of modification less than 80%.
Rania