lumpy-sv icon indicating copy to clipboard operation
lumpy-sv copied to clipboard

Warning from lumpyexpress when calling copy number variants

Open standielpls opened this issue 9 years ago • 11 comments

Hello,

I am running one of the most basic commands, found on the lumpy webpage:

lumpyexpress -B my.bam -S my.splitters.bam -D my.discordants.bam -o output.vcf

to call copy number variants.

One of the warnings that I received was "Warning: only 0 elements in distribution (min:1000). I was wondering what this means, what the "elements" are, and whether this will affect my results. I am still getting a growing vcf file as the output as we speak. Thanks!

standielpls avatar Aug 22 '16 16:08 standielpls

pairend_distro.py needs at least 1000 alignments to determine the stats that lumpy uses to id breakpoints. These alignments must:

have the flags "read paired", "mast reverse strand", "first in pair" set

not have the flags "read unmapped", "mate unmapped", "read reverse strand", "second in pair", "not primary alignment", "read is PCR or optical duplicate", "supplementary alignment"

have an insert size >= 0

have the chrom of both the alignment and mate match

In your case, none of the first 1000000 alignments met all of these criteria.

What did you use to align your data?

How did you align your bam?

ryanlayer avatar Aug 23 '16 17:08 ryanlayer

I used bwa-mem to generate the bam file, they should contain mapped and paired end reads because both gatk and samtools can give me stats regarding the number of properly paired and mapped reads in each file. I have tried to experiment with a smaller file by pulling out only the mapped reads then running lumpyexpress and I still got the same warning message.

standielpls avatar Aug 23 '16 19:08 standielpls

Can you check to see what the flags and insert sizes are on the first 1M records?

On Tue, Aug 23, 2016 at 1:59 PM, Stanley Chin [email protected] wrote:

I used bwa-mem to generate the bam file, they should contain mapped and paired end reads because both gatk and samtools can give me stats regarding the number of properly paired and mapped reads in each file.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/140#issuecomment-241857309, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUdYc4ddc9880yNqGVAQEKi07NDHdks5qi1ELgaJpZM4JqD7_ .

Ryan Layer

ryanlayer avatar Aug 23 '16 21:08 ryanlayer

Here are some examples of reads in my bam file:

NB501013:4:HHTTKBGXX:4:11511:1269:15331 99 gi|966749122|ref|NC_006097.4| 500444 40 90S25M36S = 500584 251 GGCTATAGGGGGTGTTTGGAGGATGTGGTCCTCTAAAGGGCTGTGTGGGTCCATAAGATGATCTGGAGCTATAGGGTGATCTCTAAAGGGATGTGGGGCTAGAAGGCTCTCCCTTCCTTCCTTCCCCCCCTTCCACAGCAGGGAGGGTGAA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEE/EEAEEEEAAEEEEAEEEA6<EAEEEEEEEEEEEA MC:Z:111M40S PG:Z:MarkDuplicates RG:Z:1 NM:i:1 MQ:i:60 AS:i:20 XS:i:26

NB501013:4:HHTTKBGXX:4:11608:21475:5974 99 gi|966749122|ref|NC_006097.4| 500444 40 93S25M32S = 500581 251 TGAGGCTATAGGGGGTGTTTGGAGGATGTGGTCCTCTAAAGGGCTGTGTGGGTCCATAAGATGATCTGGAGCTATAGGGTGATCTCTAAAGGGATGTGGGGCTAGAAGGCTCTCCCTTCCTTCCTTCCCCCCCTTCCACAGCAGGGAGGG AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEE<EEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEE MC:Z:114M37S PG:Z:MarkDuplicates RG:Z:1 NM:i:1 MQ:i:60 AS:i:20 XS:i:26

NB501013:4:HHTTKBGXX:2:11110:7932:9174 181 gi|966749122|ref|NC_006097.4| 500575 0 * = 500575 0 ATCCCCTAATAGCCCACCCCACAGCGCCTTATCGCCCTATGGCCCCACACAGCCCCTTATAGATCCATATCCCCCATACACTCCCTGTAGCCCCATATATCCTTATCGCCACCCCTATACACCCCCCACAGCCCCCTATACCCCACATTAC <<EE</AA/E6/E<A/AEE<///EAEEEEE/AA/</A/<E/AEEAA/E<EA/AEAAE</EA/AEE/EEEEEEEA/E/EEAE/EEEEEEEEEEEEAEE/EEEEE6EEEEE6E//EEEE/EEAAEAAE6AE6E6AEEE/E/6AAAAAEAA/AA PG:Z:MarkDuplicates RG:Z:1 MQ:i:60 AS:i:0 XS:i:0

NB501013:4:HHTTKBGXX:2:13307:14052:6749 117 gi|966749122|ref|NC_006097.4| 500575 0 * = 500575 0 CCCTTAAAGATCCATATCCCCCATACACTCCCTGTAGCCCCATATATCCTTATAGCCACTCCTATCCACCCCCCACCGCCCCCTATACCCCACATTACCCAACAGCCCCCCCGTATCGCCCCCACAGCCCCCAAAACTCCCATATCCCCAC AEE////</</EEA///EEEEA///E/E/EEE/</</EEEEE/AA<AEE//A/AAEE/EEEE/AAEEEEEEEEEAEE6EEEEE6EA/EEEEAEAA/E<EEA6EEAEEEEEEEE/E/EAEEEEEEEEAEEEEEEEEEEEEEEEEEEEAAAAA PG:Z:MarkDuplicates RG:Z:1 MQ:i:60 AS:i:0 XS:i:0

I looked through the first 1 M reads, and almost all of them are mapped. For example, there are a lot of reads with the flag number 99, which according to broadinstitute means it is read paired, read mapped in proper pair, mate reverse strand, first in pair.

In terms of insert sizes, I calculated the mean insert size using a bash script that I found here and got a mean of 23501.5. Although I am a beginner at bash, so I am not sure if the script is accurate. But most numbers in column 9 (TLEN) are above 0 and below and I know TLEN is comparable to insert size.

standielpls avatar Aug 24 '16 17:08 standielpls

Can you send me a bam with the first 1M alignments?

On Wed, Aug 24, 2016 at 11:15 AM, Stanley Chin [email protected] wrote:

Here are some examples of reads in my bam file:

NB501013:4:HHTTKBGXX:4:11511:1269:15331 99 gi|966749122|ref|NC_006097.4| 500444 40 90S25M36S = 500584 251 GGCTATAGGGGGTGTTTGGAGGATGTGGTC CTCTAAAGGGCTGTGTGGGTCCATAAGATGATCTGGAGCTATAGGGTGATCTCTAAAGGG ATGTGGGGCTAGAAGGCTCTCCCTTCCTTCCTTCCCCCCCTTCCACAGCAGGGAGGGTGAA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEAEEEEEEEEEEEEEEE/ EEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEE/EEAEEEEAAEEEEAEEEA6<EAEEEEEEEEEEEA MC:Z:111M40S PG:Z:MarkDuplicates RG:Z:1 NM:i:1 MQ:i:60 AS:i:20 XS:i:26

NB501013:4:HHTTKBGXX:4:11608:21475:5974 99 gi|966749122|ref|NC_006097.4| 500444 40 93S25M32S = 500581 251 TGAGGCTATAGGGGGTGTTTGGAGGATGTG GTCCTCTAAAGGGCTGTGTGGGTCCATAAGATGATCTGGAGCTATAGGGTGATCTCTAAA GGGATGTGGGGCTAGAAGGCTCTCCCTTCCTTCCTTCCCCCCCTTCCACAGCAGGGAGGG AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/ EEEEEEEEEEEEEEAEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEAEEEEEEAEEEEE< EEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEE MC:Z:114M37S PG:Z:MarkDuplicates RG:Z:1 NM:i:1 MQ:i:60 AS:i:20 XS:i:26

NB501013:4:HHTTKBGXX:2:11110:7932:9174 181 gi|966749122|ref|NC_006097.4| 500575 0 * = 500575 0 ATCCCCTAATAGCCCACCCCACAGCGCCTT ATCGCCCTATGGCCCCACACAGCCCCTTATAGATCCATATCCCCCATACACTCCCTGTAG CCCCATATATCCTTATCGCCACCCCTATACACCCCCCACAGCCCCCTATACCCCACATTAC <<EE</AA/E6/E<A/AEE<///EAEEEEE/AA/</A/<E/AEEAA/E<EA/ AEAAE</EA/AEE/EEEEEEEA/E/EEAE/EEEEEEEEEEEEAEE/EEEEE6EEEEE6E/ /EEEE/EEAAEAAE6AE6E6AEEE/E/6AAAAAEAA/AA PG:Z:MarkDuplicates RG:Z:1 MQ:i:60 AS:i:0 XS:i:0

NB501013:4:HHTTKBGXX:2:13307:14052:6749 117 gi|966749122|ref|NC_006097.4| 500575 0 * = 500575 0 CCCTTAAAGATCCATATCCCCCATACACTC CCTGTAGCCCCATATATCCTTATAGCCACTCCTATCCACCCCCCACCGCCCCCTATACCC CACATTACCCAACAGCCCCCCCGTATCGCCCCCACAGCCCCCAAAACTCCCATATCCCCAC AEE////</</EEA///EEEEA///E/E/EEE/</</EEEEE/AA<AEE//A/AAEE/ EEEE/AAEEEEEEEEEAEE6EEEEE6EA/EEEEAEAA/E<EEA6EEAEEEEEEEE/E/ EAEEEEEEEEAEEEEEEEEEEEEEEEEEEEAAAAA PG:Z:MarkDuplicates RG:Z:1 MQ:i:60 AS:i:0 XS:i:0

I looked through the first 1 M reads, and almost all of them are mapped. For example, there are a lot of reads with the flag number 99, which according to broadinstitute https://broadinstitute.github.io/picard/explain-flags.html means it is read paired, read mapped in proper pair, mate reverse strand, first in pair.

In terms of insert sizes, I calculated the mean insert size using a bash script that I found here https://www.biostars.org/p/16556/#45135 and got a mean of 23501.5. Although I am a beginner at bash, so I am not sure if the script is accurate. But most numbers in column 9 (TLEN) are above 0 and below and I know TLEN is comparable to insert size.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/140#issuecomment-242141509, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUS7kvcg2PBtHNIBvQbHZQWxvF65Vks5qjHw9gaJpZM4JqD7_ .

Ryan Layer

ryanlayer avatar Aug 25 '16 21:08 ryanlayer

Hi Ryan, what is your email? I can't seem to find it anywhere and the file is too big to upload onto Github.

standielpls avatar Aug 26 '16 15:08 standielpls

Gmail, ryan dot layer

On Aug 26, 2016, at 9:01 AM, Stanley Chin [email protected] wrote:

Hi Ryan, what is your email? I can't seem to find it anywhere and the file is too big to upload onto Github.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

ryanlayer avatar Aug 26 '16 15:08 ryanlayer

Hello, I am having the exact same problem, what could be done?

Thanks a lot

ghost avatar Nov 08 '16 16:11 ghost

Can you rerun with the verbose flag on (-v) and reply with the output?

On Tue, Nov 8, 2016 at 9:16 AM, aderzelle [email protected] wrote:

Hello, I am having the exact same problem, what could be done?

Thanks a lot

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/arq5x/lumpy-sv/issues/140#issuecomment-259181660, or mute the thread https://github.com/notifications/unsubscribe-auth/AAlDUWH-2sZiu1QcG3RDJVApdRmcs29fks5q8KBpgaJpZM4JqD7_ .

Ryan Layer

ryanlayer avatar Nov 08 '16 16:11 ryanlayer

Hi, I am currently having the same problem. Is the issue solved via running -v on the initial BAM file generation step?

jftchau avatar May 12 '20 08:05 jftchau

I am having the same problem - any fix for this?

Guswaneka avatar Jun 02 '20 04:06 Guswaneka