wgsim icon indicating copy to clipboard operation
wgsim copied to clipboard

read simulation for reverse and forward offset is not consistent.

Open jozerffer opened this issue 12 years ago • 6 comments

Hi,

As mention above in the title. Read simulation for reverse and forward mutation offset is not consistent. Please check.

Example:

gi|224589813|ref|NC_000021.8|_9440728_9441261_0:0:1_0:0:0_42167/2 (forward) ATGTCAAGATAATGTCAGAAATTCTTTACAATTGCTTCCAGAAGGAGTAGCCTTTTGATCTAGTGCACAGGTGTCCAGTC (TTTTA) GGCTTCTTAGGGCCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

@gi|224589813|ref|NC_000021.8|_9440248_9440824_0:0:0_0:0:1_93e94/1 (reverse) GCCCTAAGAAGCC (ATAAA) GACTGGACACCTGTGCACTAGATCAAAAGGCTACTCCTTCTGGAAGCAATTGTAAAGAATTTCTGACATTATCTTGACATGA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII

simulated insert gi|224589813|ref|NC_000021.8| 9440811 - A +

jozerffer avatar Nov 14 '11 06:11 jozerffer

What is the problem? I do not see.

lh3 avatar Nov 14 '11 13:11 lh3

Look at the forward read with simulation read of C(TTTTA)GGC and reverse read is GCC(ATAAA)G.

It is inconsistent.

jozerffer avatar Nov 15 '11 01:11 jozerffer

From wgsim output, pileup format show: 21 9440811 - A +

jozerffer avatar Nov 15 '11 01:11 jozerffer

Hi, tried using wgsim and encountered the exact problem as what reported by jozerffer. Perhaps, I can give a more visual description of the problem as follows (monospaced font would illustrate it a lot better):

Read 1: forward, has an A inserted at the 85th base of the read (represented below with an uppercase) 80 cttttAggctt 90 9440807 ctttt-ggctt 9440816

Read 2: reverse, has a T inserted at the 15th base of the read (represented below with an uppercase) 10 agccaTaaaga 20 9440815 agcca-aaaga 9440806

If, referring to the sim list of insertions, I would think that Read 2 should be

10 agccTaaaga 20

Your prompt reply to this post is much appreciated.

Thanks, James

jamesls79 avatar Nov 17 '11 07:11 jamesls79

I'm seeing something similar. All (-)-strand reads have two or more nts upstream of their indel, reversed with respect to those on the (+)-strand:

In the following alignment:

GC_C_CG .. . .. CGC_CACG CGC_CACG CGC_CACG CGC_CACG cgcac_cg cgcac_cg cgcac_cg cgcac_cg

the forward -CA becomes AC- on the reverse strand.

I also have a feature request: output a SAM/BAM file with the true alignments (CIGAR+MD tag) .

bredeson avatar May 17 '12 23:05 bredeson

Duh I just noticed this too late. It may be the same bug I just fixed in the samtools version:

https://github.com/samtools/samtools/pull/428

jkbonfield avatar Jul 09 '15 16:07 jkbonfield