gatk
gatk copied to clipboard
Initial support for spanning deletions and refactoring for testability
Spanning Deletion Changes
- Created a new SpanningDeletionRecord as a subclass of ReferenceRecord but allows us to store GT and GQ
- in handlePotentialSpanningDeletion when processing a deletion, we create a new SpanningDeletionRecord with the correct GT and GQ based on the deletion
- When processing reference data at a variant site, return ReferenceRecords/SpanningDeletionRecord instead of just a string "state" since we need more than just state now
- Because of the above, we are now returning an object for the inferred state instead of just a string. Since the inferred state is so, so common a singleton InferredReferenceRecord was created
- Processing of spanning deletions beyond
Ugly
- The construction of the singleton is ugly because it requires a location even though we don't for this case. We could go to an tagging interface (like Cloneable) these all implement, but that seems ugly also
Refactoring Changes One of the challenges with this PR was testing as the work is really done in the lower-level methods and it would be nice to have this as a unit test rather than an integration/end-to-end test. This motivated the following changes:
- don't write to VCF directly, instead have take a Consumer<VariantContext> to emit VariantContexts. This let's us provide a different consumer in unit tests to collect our result.
- we previously had a chain of calls createVariantsFromSortedRanges -> processSampleRecordsForLocation -> finalizeCurrentVariant that returned void and as a side effect wrote to VCF. These deeper methods now return a VariantContext and the writing (via consumer) is done higher up in the call stack
- made some private methods package-private so we could call them from tests
Thinking Out Loud
We have three different sets of datastructures for the same data, some of this is history, some is performance/memory, but could use some rethinking
- GenericRecord (pulling from BQ)
- ReferenceRecord/SpanningDeletionRecord (in memory data structure for reference data)
- ExtractCohortRecord (pre-variant context record)
Codecov Report
:exclamation: No coverage uploaded for pull request base (
ah_var_store@8781b56
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## ah_var_store #7857 +/- ##
================================================
Coverage ? 86.240%
Complexity ? 35196
================================================
Files ? 2173
Lines ? 165018
Branches ? 17793
================================================
Hits ? 142311
Misses ? 16380
Partials ? 6327
Github actions tests reported job failures from actions build 2347277957 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
unit | 11 | 2347277957.13 | logs |
unit | 8 | 2347277957.1 | logs |
Github actions tests reported job failures from actions build 2347281121 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
unit | 11 | 2347281121.13 | logs |
unit | 8 | 2347281121.1 | logs |
Github actions tests reported job failures from actions build 2360686024 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
unit | 11 | 2360686024.13 | logs |
unit | 8 | 2360686024.1 | logs |
PR at https://github.com/broadinstitute/gatk/pull/7857
I grok the refactoring changes and those are 👍, but I think I'm going to need a walkthrough for the core spanning deletion work...
Github actions tests reported job failures from actions build 2488811363 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
unit | 11 | 2488811363.13 | logs |
unit | 8 | 2488811363.1 | logs |
Github actions tests reported job failures from actions build 2488817096 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
unit | 11 | 2488817096.13 | logs |
unit | 8 | 2488817096.1 | logs |
Github actions tests reported job failures from actions build 2545363458 Failures in the following jobs:
Test Type | JDK | Job ID | Logs |
---|---|---|---|
unit | 11 | 2545363458.13 | logs |
unit | 8 | 2545363458.1 | logs |