modkit icon indicating copy to clipboard operation
modkit copied to clipboard

uncertain strand direction in the pileup bed file

Open SimonChen1997 opened this issue 1 year ago • 4 comments

Hi,

I found a significant issue in the pileup file where some positions have both "+" and "-" directions. However, a specific position cannot have two A bases. it should be either A or T. This issue makes the downstream analysis difficult.

image

SimonChen1997 avatar Oct 02 '24 02:10 SimonChen1997

Hi,

I found a significant issue in the pileup file where some positions have both "+" and "-" directions. However, a specific position cannot have two A bases. it should be either A or T. This issue makes the downstream analysis difficult.

image

I already tried the latest version 0.4.1, and the issue is still there.

SimonChen1997 avatar Oct 03 '24 01:10 SimonChen1997

Hello @SimonChen1997,

This may happen when you have a few reads that have a mis-call at a position causing them to have a base modification call. You can suppress emitting these records by using the --motif and --include-bed arguments.

You can see this with the test data using the following commands:


ref=tests/resources/CGI_ladder_3.6kb_ref.fa
bam=tests/resources/bc_anchored_10_reads.sorted.bam
${modkit} pileup ${bam} pileup.bed --with-header
${modkit} pileup ${bam} pileup_cg0.bed --with-header --motif CG 0 --ref ${ref}

produces the following

# pileup.bed
oligo_1512_adapters 63  64  h   5   +   63  64  255,0,0 5   20.00   1   0   4   0   1   0   0
oligo_1512_adapters 63  64  m   5   +   63  64  255,0,0 5   80.00   4   0   1   0   1   0   0
oligo_1512_adapters 63  64  h   1   -   63  64  255,0,0 1   100.00  1   0   0   0   0   3   0
oligo_1512_adapters 63  64  m   1   -   63  64  255,0,0 1   0.00    0   0   1   0   0   3   0

# pileup_cg0.bed
oligo_1512_adapters 63  64  h   5   +   63  64  255,0,0 5   20.00   1   0   4   0   1   0   0
oligo_1512_adapters 63  64  m   5   +   63  64  255,0,0 5   80.00   4   0   1   0   1   0   0

The output you have posted is a little hard for me to parse, however. It looks like there are repeat records for a genomic position, which shouldn't happen. Could you show me the commands you're running to produce this output?

ArtRand avatar Oct 06 '24 12:10 ArtRand

Hello @SimonChen1997,

This may happen when you have a few reads that have a mis-call at a position causing them to have a base modification call. You can suppress emitting these records by using the --motif and --include-bed arguments.

You can see this with the test data using the following commands:

ref=tests/resources/CGI_ladder_3.6kb_ref.fa
bam=tests/resources/bc_anchored_10_reads.sorted.bam
${modkit} pileup ${bam} pileup.bed --with-header
${modkit} pileup ${bam} pileup_cg0.bed --with-header --motif CG 0 --ref ${ref}

produces the following

# pileup.bed
oligo_1512_adapters 63  64  h   5   +   63  64  255,0,0 5   20.00   1   0   4   0   1   0   0
oligo_1512_adapters 63  64  m   5   +   63  64  255,0,0 5   80.00   4   0   1   0   1   0   0
oligo_1512_adapters 63  64  h   1   -   63  64  255,0,0 1   100.00  1   0   0   0   0   3   0
oligo_1512_adapters 63  64  m   1   -   63  64  255,0,0 1   0.00    0   0   1   0   0   3   0

# pileup_cg0.bed
oligo_1512_adapters 63  64  h   5   +   63  64  255,0,0 5   20.00   1   0   4   0   1   0   0
oligo_1512_adapters 63  64  m   5   +   63  64  255,0,0 5   80.00   4   0   1   0   1   0   0

The output you have posted is a little hard for me to parse, however. It looks like there are repeat records for a genomic position, which shouldn't happen. Could you show me the commands you're running to produce this output?

Hi,

Thanks for your reply. The repeat records are from biological replicates. However, the uncertain strand happed within one biological replicates. I'm doing bacteria methylation, so just wondering if --motif flag also works for bacteria, because there are many motifs in bacteria and there is no m6A bed for bacteria to my current knowledge.

the command I used were

modkit pileup $minimap2_primary/ecoli_${SLURM_ARRAY_TASK_ID}_primary.sorted.bam $modkit_pileup_primary/ecoli_${SLURM_ARRAY_TASK_ID}_pileup.bed

SimonChen1997 avatar Oct 06 '24 22:10 SimonChen1997

Hello @SimonChen1997,

Yes the --motif argument will work for bacterial genomes. (You may also be interested in modkit motif search). You can use a single base "motif" to narrow to a specific strand, e.g. --motif A 0 will only emit bedmethyl records for reads aligning to adenine bases (on either the sense or antisense strands).

ArtRand avatar Oct 10 '24 00:10 ArtRand

Hello @SimonChen1997,

Yes the --motif argument will work for bacterial genomes. (You may also be interested in modkit motif search). You can use a single base "motif" to narrow to a specific strand, e.g. --motif A 0 will only emit bedmethyl records for reads aligning to adenine bases (on either the sense or antisense strands).

Hi ArtRand,

By using the --motif A 0, mis-call results were suppressed. Thank you so much.

Cheers, Ziming

SimonChen1997 avatar Oct 14 '24 22:10 SimonChen1997

@SimonChen1997 great, happy to help.

ArtRand avatar Oct 15 '24 14:10 ArtRand