Peter Krusche comments

Results 35 comments of


                                            Peter Krusche

Benchmarking my dataset

Hi, I think hap.py should work with other datasets also -- you need these files minimally: * a truth VCF file * a query VCF file * a reference FASTA...

Standardizing performance metrics output file

Here are my slides from the last benchmarking call related to this issue: https://docs.google.com/presentation/d/1VCguvdhaSJI0z7Vbn_oyBYdoYsMzqMyjlTIroHoLBks/edit?usp=sharing Also, the proposed output format in there is now supported by hap.py 0.3.0 and is documented...

Standardizing performance metrics output file

Also, here are some comments w.r.t. the differences between hap.py and the metrics definitions document: - hap.py outputs _TRUTH_.TP and _QUERY_.TP. Since (VCF-based) counts can be different depending on the...

Records in intermediate VCF format

@bioinformed : about hap.py / xcmp: they will implement the new intermediate format soon, probably in February (it started out similar to what hap.py is writing, but changed during the...

Records in intermediate VCF format

In the matching case, if the comparison tool chooses to not split any input variants, I guess the only way to output the result is to print the records as...

hap.py annotated VCF file as output issues running with docker

Can you paste the column header line of that VCF file? Maybe there is a spurious tab / space at the end of th e `#CHROM ...` line?

Warning in log file ("Too many AD fields") for gatk-vcf only

I suspect that decomposition somehow doesn’t get the AD values quite right. I think the easiest way to fix this would be to drop the AD fields in advance using...

Support for allele only comparisons

Hap.py allows to do this via the `--set-gt` option. ``` --set-gt {half,hemi,het,hom} This is used to treat Strelka somatic files Possible values for this parameter: half / hemi / het...

Support for allele only comparisons

Another way to get an idea of the allele-only performance is to run in standard mode and use the FP.GT value. FP.GT gives the number of query calls which the...

Support for allele only comparisons

For a 1/2 call, `--set-gt hom` would produce two 1/1 records, one for each allele. Of course these cannot be haplotype matched anymore if they overlap on the reference after...