emeraLD
emeraLD copied to clipboard
Seg fault on large VCF
Hi, i wanted to try this tool as it seems very promising. I have multiple hundred human samples with SNPs called and wanted to run emeraLD. However, within seconds I get a seg fault:
NOTE: genotype data appear to be unphased reporting genotype LD rather than haplotype LD use "--phased" option to override this behaviour ./run_LDanalysis.sh: line 14: 15620 Segmentation fault ~/mydir/programs/emeraLD/bin/emeraLD -i $reads --out output_chr22.txt --region chr22:10511578-20511578
were $reads is my bgziped and indexed VCF file. I also run into the same problem when I dont define a region. Here is a short sample of the VCF file (without the header) and just the first two samples. As you can see they are not phased per sample.
Please let me what is going wrong as I would really like to run emeraLD on that data set. Thanks Fritz
chr22 10511193 . T C 56 . . GT:GQ:PL:DP:RR:VR:FT:RNC ./.:.:0,27,30:0:0:0:No_data:.. 1/1:.:27,3,0:2:0:2:low_coverage;low_Var
chr22 10511228 . T A 98 . . GT:GQ:PL:DP:RR:VR:FT:RNC ./.:.:0,27,30:0:0:0:No_data:.. 0/0:.:0,35,114:2:2:0:No_var:.. ./.:.:0
chr22 10511254 . A G 87 . . GT:GQ:PL:DP:RR:VR:FT:RNC 0/0:.:0,32,80:1:1:0:No_var:.. 0/0:.:0,35,114:2:2:0:No_var:.. ./.:.:0
chr22 10511255 . C A 73 . . GT:GQ:PL:DP:RR:VR:FT:RNC 0/0:.:0,32,80:1:1:0:No_var:.. 0/0:.:0,35,114:2:2:0:No_var:.. ./.:.:0
chr22 10511270 . T A 229 . . GT:GQ:PL:DP:RR:VR:FT:RNC 0/0:.:0,32,80:1:1:0:No_var:.. 0/0:.:0,35,114:2:2:0:No_var:.. ./.:.:0
Chances are you need to run make clean && make
from the source directory. The executable checked into the bin directory is most likely dynamically linked to system libs.
Hi,
Thanks for contacting us about this. I'm not sure if it's the only issue here, but unfortunately emeraLD does not currently support missing genotypes. Genotype imputation (e.g., using https://imputationserver.sph.umich.edu or https://genome.sph.umich.edu/wiki/Minimac3) is probably the best way to work around this for the time being.
Thanks, Corbin
On Sun, Aug 5, 2018 at 12:15 PM Jonathon LeFaive [email protected] wrote:
Chances are you need to run make clean && make from the source directory. The executable checked into the bin directory is most likely dynamically linked to system libs.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/statgen/emeraLD/issues/5#issuecomment-410530624, or mute the thread https://github.com/notifications/unsubscribe-auth/AC0AM4ZLdLW3KpNI23FkoA7ZIkORM2Mkks5uNxoPgaJpZM4VvapJ .
I think if you run with --no-phase
it will work, but as Corbin says, the program will ignore missing genotypes, so you may want to do imputation or only include variants with a high call rate.
@corbinq - I think there's also a small bug in auto-detecting unphased VCFs, see 8d74f6070a8. I'll send a PR if that looks okay to you.
Thanks everyone. I am traveling but will try that and Minimac4 for these files. Thanks Fritz
Ok I am sorry guys this is a very nooby question and its not necessarily related to emeraLD.
So I have this VCF file with just called SNPs of multiple hundred of human samples and I want to study LD over these samples. What is the way to go? Or can you point me to a review/ paper/ summary/ instructions? Thanks for your help Fritz