emeraLD icon indicating copy to clipboard operation
emeraLD copied to clipboard

Seg fault on large VCF

Open fritzsedlazeck opened this issue 6 years ago • 6 comments

Hi, i wanted to try this tool as it seems very promising. I have multiple hundred human samples with SNPs called and wanted to run emeraLD. However, within seconds I get a seg fault:

NOTE: genotype data appear to be unphased reporting genotype LD rather than haplotype LD use "--phased" option to override this behaviour ./run_LDanalysis.sh: line 14: 15620 Segmentation fault ~/mydir/programs/emeraLD/bin/emeraLD -i $reads --out output_chr22.txt --region chr22:10511578-20511578

were $reads is my bgziped and indexed VCF file. I also run into the same problem when I dont define a region. Here is a short sample of the VCF file (without the header) and just the first two samples. As you can see they are not phased per sample.

Please let me what is going wrong as I would really like to run emeraLD on that data set. Thanks Fritz

chr22   10511193        .       T       C       56      .       .       GT:GQ:PL:DP:RR:VR:FT:RNC        ./.:.:0,27,30:0:0:0:No_data:..  1/1:.:27,3,0:2:0:2:low_coverage;low_Var
chr22   10511228        .       T       A       98      .       .       GT:GQ:PL:DP:RR:VR:FT:RNC        ./.:.:0,27,30:0:0:0:No_data:..  0/0:.:0,35,114:2:2:0:No_var:..  ./.:.:0
chr22   10511254        .       A       G       87      .       .       GT:GQ:PL:DP:RR:VR:FT:RNC        0/0:.:0,32,80:1:1:0:No_var:..   0/0:.:0,35,114:2:2:0:No_var:..  ./.:.:0
chr22   10511255        .       C       A       73      .       .       GT:GQ:PL:DP:RR:VR:FT:RNC        0/0:.:0,32,80:1:1:0:No_var:..   0/0:.:0,35,114:2:2:0:No_var:..  ./.:.:0
chr22   10511270        .       T       A       229     .       .       GT:GQ:PL:DP:RR:VR:FT:RNC        0/0:.:0,32,80:1:1:0:No_var:..   0/0:.:0,35,114:2:2:0:No_var:..  ./.:.:0

fritzsedlazeck avatar Aug 05 '18 15:08 fritzsedlazeck

Chances are you need to run make clean && make from the source directory. The executable checked into the bin directory is most likely dynamically linked to system libs.

jonathonl avatar Aug 05 '18 16:08 jonathonl

Hi,

Thanks for contacting us about this. I'm not sure if it's the only issue here, but unfortunately emeraLD does not currently support missing genotypes. Genotype imputation (e.g., using https://imputationserver.sph.umich.edu or https://genome.sph.umich.edu/wiki/Minimac3) is probably the best way to work around this for the time being.

Thanks, Corbin

On Sun, Aug 5, 2018 at 12:15 PM Jonathon LeFaive [email protected] wrote:

Chances are you need to run make clean && make from the source directory. The executable checked into the bin directory is most likely dynamically linked to system libs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/emeraLD/issues/5#issuecomment-410530624, or mute the thread https://github.com/notifications/unsubscribe-auth/AC0AM4ZLdLW3KpNI23FkoA7ZIkORM2Mkks5uNxoPgaJpZM4VvapJ .

corbinq avatar Aug 05 '18 22:08 corbinq

I think if you run with --no-phase it will work, but as Corbin says, the program will ignore missing genotypes, so you may want to do imputation or only include variants with a high call rate.

welchr avatar Aug 08 '18 15:08 welchr

@corbinq - I think there's also a small bug in auto-detecting unphased VCFs, see 8d74f6070a8. I'll send a PR if that looks okay to you.

welchr avatar Aug 08 '18 15:08 welchr

Thanks everyone. I am traveling but will try that and Minimac4 for these files. Thanks Fritz

fritzsedlazeck avatar Aug 08 '18 15:08 fritzsedlazeck

Ok I am sorry guys this is a very nooby question and its not necessarily related to emeraLD.

So I have this VCF file with just called SNPs of multiple hundred of human samples and I want to study LD over these samples. What is the way to go? Or can you point me to a review/ paper/ summary/ instructions? Thanks for your help Fritz

fritzsedlazeck avatar Aug 13 '18 22:08 fritzsedlazeck