modkit icon indicating copy to clipboard operation
modkit copied to clipboard

bedMethyl file has no motif information

Open uuwuyi opened this issue 1 year ago • 7 comments

Hi there,

Here is the whole process I run after dorado 5mC and 5hmC basecalling.

dorado aligner reference.fasta dorado_5mC_5hmC.bam > aligned.bam
samtools sort aligned.bam > aligned_sort.bam
samtools index aligned_sort.bam

I have tried modkit pileup aligned_sort.bam pileup_test.bed --cpg --log-filepath pileup.log --ref reference.fasta and

modkit pileup aligned_sort.bam pileup.bed --log-filepath pileup.log --ref reference.fasta -t 8
modkit find-motifs --in-bedmethyl pileup.bed --ref reference.fasta -o motif.tsv -t 8

However, in the "modified base code and motif" column of the modbed file, it only contains the modified base code not motif. and the motif.tsv is empty. Do you have an idea what the problem would be? Thanks!

uuwuyi avatar Jun 20 '24 17:06 uuwuyi

Hello @uuwuyi,

However, in the "modified base code and motif" column of the modbed file,

I'm not sure exactly which file you're referring to. Could you tell me the name of the file you're referring to and paste the first few rows of that file?

Also, could you be sure that you're using the latest release candidate?

You can check by doing

$ modkit --version
# should say mod_kit 0.3.1

Happy to help, I just need a little more information.

ArtRand avatar Jun 20 '24 21:06 ArtRand

Hi @ArtRand,

my modkit version is 0.3.0. Below is part of the pileup.bed file. Sorry for my confusing way to call it. I mean the bedMethyl table. In the fourth column, it only contains the modified base code. I wonder if this is the reason that I didn't get the motif using find-motifs subcommand. Thanks again for your help!

2 11300 11301 h 1 - 11300 11301 255,0,0 1 0.00 0 1 0 0 0 00 2 11300 11301 m 1 - 11300 11301 255,0,0 1 0.00 0 1 0 0 0 00 2 11603 11604 h 1 - 11603 11604 255,0,0 1 0.00 0 1 0 0 0 00 2 11603 11604 m 1 - 11603 11604 255,0,0 1 0.00 0 1 0 0 0 00 2 11614 11615 h 1 - 11614 11615 255,0,0 1 0.00 0 1 0 0 0 00 2 11614 11615 m 1 - 11614 11615 255,0,0 1 0.00 0 1 0 0 0 00 2 11620 11621 h 1 - 11620 11621 255,0,0 1 0.00 0 1 0 0 0 00 2 11620 11621 m 1 - 11620 11621 255,0,0 1 0.00 0 1 0 0 0 00 2 11661 11662 h 1 - 11661 11662 255,0,0 1 0.00 0 1 0 0 0 00 2 11661 11662 m 1 - 11661 11662 255,0,0 1 0.00 0 1 0 0 0 00

uuwuyi avatar Jun 20 '24 21:06 uuwuyi

Hello @uuwuyi,

The fourth column of the bedMethyl table should only have the modified base code, the snippet you have here is what I would expect. Version v0.3.0 had a bug where the output table would not be printed correctly. Could you try v0.3.1rc1? If you continue to see the issue, could you attach the log from the find-motifs run?

ArtRand avatar Jun 20 '24 22:06 ArtRand

Hi @ArtRand, find_motif_test.log find_motif.log

I have updated the modkit to v0.3.1 and run find-motifs on my bacterial samples (~80 samples). As the log file shows, it is indeed searching for motifs. But I am a bit surprised that all of the motif.tsv files I have got have no motif found. I am unsure if I did something wrong. I tried to lower down the and but it didn't help. Many thanks again for your help!

uuwuyi avatar Jun 22 '24 11:06 uuwuyi

Hello @uuwuyi,

Thanks for sending the logs over. It looks like the overall level of modification in this sample is low. From the logs you can trace the progression of the search. You start with 153 high-frequency contexts (m and h combined). You can see that the first stage finds some enriched sites but then they are discarded, for example:

[src/find_motifs/mod.rs::1893][2024-06-21 18:11:14][DEBUG] discarding TGAYC[m]GD, high-modified sites too low, 38

You could lower the --min-sites option even more (maybe 30), however I can't say whether this makes sense for your experiment. Another option is to take some of the motifs from the logs (e.g. YC[m]GD) and pass them as known motifs --known-motif YCCGD 2 m and the command should measure the methylation level for those sites. Hope this helps, happy to answer additional questions.

ArtRand avatar Jun 24 '24 18:06 ArtRand

Hi @ArtRand,

I am using 30x sequencing data. I have tried lowering --min-sites to 30 and it still didn't find any motifs. Do you have any flags and thresholds that would recommend considering it's 30x depth? Also, I run the dorado modified basecalling using the command below, if you don't mind checking if this is ok for correct basecalling. I used fast5 files instead of the pod5. Could it be a reason for low 5mC modification detection? Thanks again for your patience and help! dorado basecaller /path/to/[email protected] /path/to/fast5 --modified-bases-models /path/to/[email protected]_5mCG@v0 > dorado_calls.bam

uuwuyi avatar Jun 26 '24 14:06 uuwuyi

Hello @uuwuyi,

Did you use --min-sites 30 and --min-frac-mod 0.5? Could you attach the log?

ArtRand avatar Jun 27 '24 13:06 ArtRand