dipcall icon indicating copy to clipboard operation
dipcall copied to clipboard

accuracy of truth and following benchmark

Open Overcraft90 opened this issue 1 year ago • 1 comments

Hi there,

I'm working with a non-model plant species (white lupine) for which I'm doing some benchmark tests comparing GATK and DV after pangenome alignment with Giraffe.

The idea is to select the best caller for subsequent population/domestication analyses for the species; however, it appears that for both tools recall, precision and F1 are extremely low... So, I wonder what might be affecting this, and if by any chance the benchmark tool, hap.py, might be involved in all this.

The reference for this species is AMIGA, and I changed the chromosome notation in the reference to match the chr format for one because of #1 but also because it appears to be the standard notation expected by hap.py. For non-standard/chromosome contigs I use the chrUn_ notation as in the GRCh38 human reference.

Lastly, the two assemblies for the individual I have built the truth from have been generated with HiFi+Hi-C (relative coverage of 50.90× and 130.16×) with hifiasm, scaffolded with RagTag and curated with JuicerTool.

If helpful I can post some stats (assembly-stats, compleasm and Merqury) for those as well as the values for all metrics. Let me know, thanks!

Overcraft90 avatar Dec 05 '24 12:12 Overcraft90

Any update on this issue, or on why/what is the cause?

Overcraft90 avatar Jan 09 '25 18:01 Overcraft90