bcbio.variation icon indicating copy to clipboard operation
bcbio.variation copied to clipboard

Error at the validation step

Open ssaif opened this issue 10 years ago • 7 comments

Hello,

I am trying to incorporate the ensemble approach in my bcbio analysis and getting errors at the bcbio.variation command for validation of calls. Here are some details,

Run log - /gpfs/ngs/oncology/Analysis/external/EXT_001_NA12878/EDGE/NA12878_bcbio_NGv3bed/work/run.log

Yaml file for bcbio.variation (to validate freebayes calls) -/gpfs/ngs/oncology/Analysis/external/EXT_001_NA12878/EDGE/NA12878_bcbio_NGv3bed/work/validate/NA12878_Germline_NGv3bed/freebayes/config/validate.yaml

Please let me know if you need additional information about the analysis.

Thanks, Sakina

ssaif avatar Sep 03 '14 19:09 ssaif

Sakina; Thanks for the report. Happy to look at this if you could make the log and validation files available at a Gist (https://gist.github.com/). Thanks much.

chapmanb avatar Sep 04 '14 09:09 chapmanb

Hi,

They are available here. Please let me know if you can access them.

https://gist.github.com/ssaif/fbb164d1f28b3f4133c3 (Error lines pasted with flanks from the run log) https://gist.github.com/ssaif/40228395b0f50f9585e9 (Yaml file for freebayes validation)

Thanks, Sakina

ssaif avatar Sep 04 '14 14:09 ssaif

Sakina; Thanks for the additional detail. It appears as if something is wrong with one of your input VCF files, specifically that is has truncated lines. The code is failing when it tries to access the reference allele to remove any gaps, and is finding a line with fewer fields than expected:

https://github.com/chapmanb/bcbio.variation/blob/fc5bac476ec9d9efb79dfd07a07590e319d95ba2/src/bcbio/variation/normalize.clj#L571

It would be worth checking the input VCF to see if something is wrong:

bcftools view /gpfs/ngs/oncology/Analysis/external/EXT_001_NA12878/EDGE/NA12878_bcbio_NGv3bed/work/freebayes/NA12878_Germline_NGv3bed-effects-ploidyfix-filter.vcf.gz

This should spit out the file and perhaps give a better error message to help debug. Hope this helps some with identifying the issue.

chapmanb avatar Sep 05 '14 01:09 chapmanb

Hi Brad,

Thanks for the quick reposnse. I did a few checks on the vcf file and it seems to check out OK.

Another thing I want to point out is that with this run of bcbio where I am also doing the ensemble step, I notice there are vcf files within each caller directory that seem to contain a combined call set (from all chromosomes). This is typically not seen in the run sans bcbio.variation. And the vcf file where you pointed out the error is one such combined calls file. Are these combined output files part of bcbio.variation run?

In order to test this I will run bcbio.variation standalone on calls generated by chromosomes that will hopefully reproduce this behaviour/error.

Thanks, Sakina

ssaif avatar Sep 05 '14 21:09 ssaif

Forgot to share this that I also found that the freebayes vcf did not have calls on chrM because the Nimblegen bed file did not have chrM regions. But the GiaB NIST's vcf and bed files (with hg19) that I using to validate my calls do have chrM (starts with this order) information. Could this be the cause of the bcbio.variation error I am getting?

Thanks, Sakina

ssaif avatar Sep 05 '14 21:09 ssaif

This was using BCBIO version 0.8.1a (alpha), not sure if I mentioned this earlier.

Thanks, Sakina

ssaif avatar Sep 05 '14 21:09 ssaif

Sakina; Thanks for looking into this more. I added better debugging into a snapshot release of bcbio.variation. If you could download this and replace the existing version this should hopefully provide the exact line in the VCF it is failing at:

wget https://github.com/chapmanb/bcbio.variation/releases/download/v0.1.8-SNAPSHOT-20140906/bcbio.variation-0.1.8-SNAPSHOT-standalone.jar
mv bcbio.variation-0.1.8-SNAPSHOT-standalone.jar /group/ngs/src/bcbio-nextgen/0.8.1a/rhel6-x64/share/java/bcbio_variation/
rm /group/ngs/src/bcbio-nextgen/0.8.1a/rhel6-x64/share/java/bcbio_variation/bcbio.variation-0.1.7-standalone.jar

Regarding your other observations, the comparison handles cases where the regions differ between the input and reference calls. It will only compare in regions present in both, so this shouldn't be an issue. It also prepares combined VCFs independent of bcbio.variation evaluation. That is done for all calling; this is the final input file concatenated from the input files.

Hope re-running with the updated code will help identify the problematic VCF line and shed more information on what is happening.

chapmanb avatar Sep 07 '14 02:09 chapmanb