gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Subsetting for A-length annotations (e.g. AF, PRI, F1R2, F2R1),

Open ldgauthier opened this issue 2 years ago • 9 comments

accounting for incorrect input lengths

ldgauthier avatar Mar 15 '22 18:03 ldgauthier

Travis reported job failures from build 38115 Failures in the following jobs:

Test Type JDK Job ID Logs
cloud openjdk8 38115.1 logs
cloud openjdk11 38115.14 logs
unit openjdk11 38115.13 logs
integration openjdk11 38115.12 logs
unit openjdk8 38115.3 logs
variantcalling openjdk8 38115.4 logs
integration openjdk8 38115.2 logs

gatk-bot avatar Mar 15 '22 19:03 gatk-bot

Hello @ldgauthier.

Just to let you to know what we tested so far, and what was working....

1 - Reblock with the snapshot + pipe with gnarly SNAP --> working on diploids (doesn`t work with sexual chromosomes)

2 - Reblock with the snapshot + pipe without gnarly SNAP --> not working, it broke at GenotypeGVCF

3 - Reblock with GATK 4.2.5 + pipe with gnarly 4.2.5 --> it doesn`t work

4 - Reblock with GATK 4.2.5 + pipe without gnarly 4.2.5 --> it doesn`t work

5 - Reblock with the snapshot + pipe without gnarly 4.2.5 --> It works even with sexual chromosomes

The 5th is the way to go... It seems Reblock from GATK 4.2.5 has a bug that the SNAPSHOP doesn`t have anymore.

I hope this help

LiviaMoura avatar Mar 15 '22 19:03 LiviaMoura

Yeah, there is a reblocking bug in 4.2.5.0. It's fixed in master, but we are waiting on a Google NIO fix for requester-pays buckets before we release a new version.

If the number of samples you have works with GenotypeGVCFs, then there's certainly nothing wrong with that. I also plan on taking a look at haploid genotypes in Gnarly this week, but I understand if you don't have a lot of faith in that considering how long the other fix has taken. :-D

ldgauthier avatar Mar 15 '22 20:03 ldgauthier

Haha.. It's not about faith.. For now, I really don't know why I should use gnarly as a step, instead of GenotypeGVCF right away (with usegnarly = false). As gnarly says it's a "dirty" method, why should I use the "dirty" if the other method, without gnarly, is working?

If you can explain which is the difference, I'd be glad.

The only thing I know (because I read on an issue) is that we need Reblock to work with Dragen samples, so I'm following these tips

LiviaMoura avatar Mar 15 '22 20:03 LiviaMoura

This is a hot topic recently, so I already have a doc to compare and contrast: https://docs.google.com/document/d/1qws0owSEc0XGcZGAcxmBOEk8fiWS1Dnv4tvHNgC_xVU/edit?usp=sharing

Gnarly is still a "beta" tool. I wanted to add some way to reduce the number of alternate alleles, but that may be easier to do after this recent GenomicsDB update.

ldgauthier avatar Mar 16 '22 14:03 ldgauthier

Excelent... good job I'm glad that I read "is more sensitive to rare alleles at common sites". As we are working with rare diseases, this is gold. Ok... I'm convinced that I need to wait for Gnarly fix xD (or I'll use chr1-22 on it, and sexual without it for now...) Thank you very much

LiviaMoura avatar Mar 16 '22 14:03 LiviaMoura

Github actions tests reported job failures from actions build 2203889963 Failures in the following jobs:

Test Type JDK Job ID Logs
cloud 8 2203889963.10 logs
cloud 11 2203889963.11 logs
unit 11 2203889963.13 logs
integration 11 2203889963.12 logs
variantcalling 8 2203889963.2 logs
unit 8 2203889963.1 logs
integration 8 2203889963.0 logs

gatk-bot avatar Apr 21 '22 19:04 gatk-bot

Codecov Report

Merging #7725 (dc2d48f) into master (b6a28d1) will decrease coverage by 73.022%. The diff coverage is 7.282%.

@@               Coverage Diff                @@
##              master     #7725        +/-   ##
================================================
- Coverage     86.941%   13.919%   -73.022%     
+ Complexity     36860      7450     -29410     
================================================
  Files           2211      2219         +8     
  Lines         173376    173843       +467     
  Branches       18710     18795        +85     
================================================
- Hits          150734     24197    -126537     
- Misses         16055    146992    +130937     
+ Partials        6587      2654      -3933     
Impacted Files Coverage Δ
...bender/tools/walkers/variantutils/ReblockGVCF.java 0.000% <0.000%> (-80.711%) :arrow_down:
...der/tools/walkers/variantutils/SelectVariants.java 40.212% <0.000%> (-40.955%) :arrow_down:
...lbender/utils/variant/GATKVariantContextUtils.java 26.945% <0.000%> (-60.279%) :arrow_down:
...s/variant/writers/ReblockingGVCFBlockCombiner.java 0.000% <0.000%> (-77.083%) :arrow_down:
...lkers/genotyper/AlleleSubsettingUtilsUnitTest.java 1.709% <0.000%> (-97.009%) :arrow_down:
...lkers/variantutils/ReblockGVCFIntegrationTest.java 1.010% <0.000%> (-96.664%) :arrow_down:
...ools/walkers/variantutils/ReblockGVCFUnitTest.java 2.333% <0.000%> (-96.667%) :arrow_down:
...tools/walkers/genotyper/AlleleSubsettingUtils.java 26.871% <23.333%> (-56.252%) :arrow_down:
...nder/tools/walkers/genotyper/GenotypingEngine.java 49.686% <50.000%> (-36.478%) :arrow_down:
.../org/broadinstitute/hellbender/utils/IGVUtils.java 0.000% <0.000%> (-100.000%) :arrow_down:
... and 1954 more

codecov[bot] avatar Apr 21 '22 20:04 codecov[bot]

Github actions tests reported job failures from actions build 2340788459 Failures in the following jobs:

Test Type JDK Job ID Logs
unit 11 2340788459.13 logs
integration 11 2340788459.12 logs
unit 8 2340788459.1 logs
variantcalling 8 2340788459.2 logs
integration 8 2340788459.0 logs

gatk-bot avatar May 17 '22 19:05 gatk-bot