uncertain strand direction in the pileup bed file
Hi,
I found a significant issue in the pileup file where some positions have both "+" and "-" directions. However, a specific position cannot have two A bases. it should be either A or T. This issue makes the downstream analysis difficult.
Hi,
I found a significant issue in the pileup file where some positions have both "+" and "-" directions. However, a specific position cannot have two A bases. it should be either A or T. This issue makes the downstream analysis difficult.
![]()
I already tried the latest version 0.4.1, and the issue is still there.
Hello @SimonChen1997,
This may happen when you have a few reads that have a mis-call at a position causing them to have a base modification call. You can suppress emitting these records by using the --motif and --include-bed arguments.
You can see this with the test data using the following commands:
ref=tests/resources/CGI_ladder_3.6kb_ref.fa
bam=tests/resources/bc_anchored_10_reads.sorted.bam
${modkit} pileup ${bam} pileup.bed --with-header
${modkit} pileup ${bam} pileup_cg0.bed --with-header --motif CG 0 --ref ${ref}
produces the following
# pileup.bed
oligo_1512_adapters 63 64 h 5 + 63 64 255,0,0 5 20.00 1 0 4 0 1 0 0
oligo_1512_adapters 63 64 m 5 + 63 64 255,0,0 5 80.00 4 0 1 0 1 0 0
oligo_1512_adapters 63 64 h 1 - 63 64 255,0,0 1 100.00 1 0 0 0 0 3 0
oligo_1512_adapters 63 64 m 1 - 63 64 255,0,0 1 0.00 0 0 1 0 0 3 0
# pileup_cg0.bed
oligo_1512_adapters 63 64 h 5 + 63 64 255,0,0 5 20.00 1 0 4 0 1 0 0
oligo_1512_adapters 63 64 m 5 + 63 64 255,0,0 5 80.00 4 0 1 0 1 0 0
The output you have posted is a little hard for me to parse, however. It looks like there are repeat records for a genomic position, which shouldn't happen. Could you show me the commands you're running to produce this output?
Hello @SimonChen1997,
This may happen when you have a few reads that have a mis-call at a position causing them to have a base modification call. You can suppress emitting these records by using the
--motifand--include-bedarguments.You can see this with the test data using the following commands:
ref=tests/resources/CGI_ladder_3.6kb_ref.fa bam=tests/resources/bc_anchored_10_reads.sorted.bam ${modkit} pileup ${bam} pileup.bed --with-header ${modkit} pileup ${bam} pileup_cg0.bed --with-header --motif CG 0 --ref ${ref}produces the following
# pileup.bed oligo_1512_adapters 63 64 h 5 + 63 64 255,0,0 5 20.00 1 0 4 0 1 0 0 oligo_1512_adapters 63 64 m 5 + 63 64 255,0,0 5 80.00 4 0 1 0 1 0 0 oligo_1512_adapters 63 64 h 1 - 63 64 255,0,0 1 100.00 1 0 0 0 0 3 0 oligo_1512_adapters 63 64 m 1 - 63 64 255,0,0 1 0.00 0 0 1 0 0 3 0 # pileup_cg0.bed oligo_1512_adapters 63 64 h 5 + 63 64 255,0,0 5 20.00 1 0 4 0 1 0 0 oligo_1512_adapters 63 64 m 5 + 63 64 255,0,0 5 80.00 4 0 1 0 1 0 0The output you have posted is a little hard for me to parse, however. It looks like there are repeat records for a genomic position, which shouldn't happen. Could you show me the commands you're running to produce this output?
Hi,
Thanks for your reply. The repeat records are from biological replicates. However, the uncertain strand happed within one biological replicates. I'm doing bacteria methylation, so just wondering if --motif flag also works for bacteria, because there are many motifs in bacteria and there is no m6A bed for bacteria to my current knowledge.
the command I used were
modkit pileup $minimap2_primary/ecoli_${SLURM_ARRAY_TASK_ID}_primary.sorted.bam $modkit_pileup_primary/ecoli_${SLURM_ARRAY_TASK_ID}_pileup.bed
Hello @SimonChen1997,
Yes the --motif argument will work for bacterial genomes. (You may also be interested in modkit motif search). You can use a single base "motif" to narrow to a specific strand, e.g. --motif A 0 will only emit bedmethyl records for reads aligning to adenine bases (on either the sense or antisense strands).
Hello @SimonChen1997,
Yes the
--motifargument will work for bacterial genomes. (You may also be interested inmodkit motif search). You can use a single base "motif" to narrow to a specific strand, e.g.--motif A 0will only emit bedmethyl records for reads aligning to adenine bases (on either the sense or antisense strands).
Hi ArtRand,
By using the --motif A 0, mis-call results were suppressed. Thank you so much.
Cheers, Ziming
@SimonChen1997 great, happy to help.