modkit sample-probs with 2OmeACUG
Hi, @ArtRand
I encountered some issues while using modkit sample-probs with 2OmeACUG.
Here is my code:
${modkitDir}/modkit sample-probs ${bamfile}/input_merge_sup_m6A_pseU_m5C_inosine_2OmeACUG.mod.sorted.bam -t 40 --log-filepath ${bamfile}/input_merge_sup_2OmeA_sample_prob/input_merge_sup_2OmeA.sample_pob.log --percentiles 0.1,0.25,0.5,0.75,0.9 --out-dir ${bamfile}/input_merge_sup_2OmeA_sample_prob --hist --num-reads 20084 --include-bed ${index_dir}/A_0_transcripts.bed --only-mapped
${modkitDir}/modkit sample-probs ${bamfile}/input_merge_sup_m6A_pseU_m5C_inosine_2OmeACUG.mod.sorted.bam -t 40 --log-filepath ${bamfile}/input_merge_sup_2OmeG_sample_prob/input_merge_sup_2OmeG.sample_pob.log --percentiles 0.1,0.25,0.5,0.75,0.9 --out-dir ${bamfile}/input_merge_sup_2OmeG_sample_prob --hist --num-reads 20084 --include-bed ${index_dir}/G_0_transcripts.bed --only-mapped
I use probabilities.tsv to generate the plot.
First and second image are 2OmeA.
Third and fourth image are 2OmeG.
Why is there such a huge difference between 2OmeA and 2OmeG?
Best wishes, Kirito
The primary difference here is the number of modified bases predicted by the models. The G model has only the 2'Ome modified bases predicted output. While the A mods model has 2'Ome along with m6A and inosine output bases. This results in a larger range of possible outputs and associated distributions. If you use modkit to ignore m6A and inosine, I imagine that you might find that the distributions are much more similar; assuming there are not real 2'Ome bases in this sample and that this an interrogation of the false positive distribution of probabilities. I hope this helps, but please reach out if mote clarification would help.
Hi, @marcus1487 Thank you for your prompt reply.
I only Use 2'Ome modification to plot the probabilities. I have taken a portion of the decoration image.
The primary difference here is the number of modified bases predicted by the models. The G model has only the 2'Ome modified bases predicted output. While the A mods model has 2'Ome along with m6A and inosine output bases. This results in a larger range of possible outputs and associated distributions. If you use modkit to ignore m6A and inosine, I imagine that you might find that the distributions are much more similar; assuming there are not real 2'Ome bases in this sample and that this an interrogation of the false positive distribution of probabilities. I hope this helps, but please reach out if mote clarification would help.
The first image only A bases from counts.html .
The second image only G bases from counts.html .
It can be seen that in a considerable part of the area, there is no 2'Ome G modification.
After using modkit sample-probs, I used modkit pileup to 2'Ome modification and set 0.97 for --mod-thresholds.
Here is my code for 2'OmeU and the same --mod-thresholds to other modifications.:
${modkitDir}/modkit pileup input_merge_sup_m6A_pseU_m5C_inosine_2OmeACUG.mod.sorted.bam input_merge_sup.pass.2OmeU.bed --ref gencode.vM33.normal.transcripts.fa --include-bed pseU_0_transcripts.bed --motif T 0 --log-filepath input_merge_sup_2OmeU.log --num-reads 20084 --max-depth 20000 --filter-threshold T:0.9 --mod-thresholds 19227:0.97 -t 40
awk '{if($4==19227) print$0}' input_merge_sup.pass.2OmeU.bed > input_merge_sup.pass.2OmeU.filter.bed
I use coverage >= 20 and modnum >=20 and site ratio >= 0.1 for each 2'Ome sites.
However, when I mapped the 2'Ome sites to the genome, I found that most of the sites were enriched at the 3 'UTR end. This leaves me very confused.
Best wishes, Kirito