VCFHeader and VCFHeaderLine rewrite/refactoring
This PR is resurrected from the PR graveyard, updated and rebased. It contains:
- significant updates to VCFHeader and VCFHeaderLine class hierarchy
- significantly updated VCFHeader merging
- prerequisites for writing VCF4.3 and BCF2.2
- some source-level incompatibilities for VCFHeader and VCFHeaderLine (a detailed document is being prepared)
Fixes numerous issues, including:
- https://github.com/samtools/htsjdk/issues/250
- https://github.com/samtools/htsjdk/issues/251
- https://github.com/samtools/htsjdk/issues/1574
- https://github.com/samtools/htsjdk/issues/277
- https://github.com/samtools/htsjdk/issues/500
- https://github.com/samtools/htsjdk/issues/507
- https://github.com/samtools/htsjdk/issues/730
- https://github.com/samtools/htsjdk/issues/575
- https://github.com/samtools/htsjdk/issues/929
- https://github.com/samtools/htsjdk/issues/1573
- https://github.com/samtools/htsjdk/issues/1667
Codecov Report
Merging #1581 (3084903) into master (347c0ac) will increase coverage by
0.022%. The diff coverage is79.088%.
:exclamation: Current head 3084903 differs from pull request most recent head a4c529d. Consider uploading reports for the commit a4c529d to get more accurate results
:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.
@@ Coverage Diff @@
## master #1581 +/- ##
===============================================
+ Coverage 69.856% 69.878% +0.022%
- Complexity 9695 9832 +137
===============================================
Files 703 706 +3
Lines 37772 38022 +250
Branches 6139 6155 +16
===============================================
+ Hits 26386 26569 +183
- Misses 8929 8965 +36
- Partials 2457 2488 +31
| Impacted Files | Coverage Δ | |
|---|---|---|
| ...in/java/htsjdk/samtools/SAMSequenceDictionary.java | 75.000% <ø> (ø) |
|
| ...c/main/java/htsjdk/variant/vcf/VCFRecordCodec.java | 0.000% <0.000%> (ø) |
|
| src/main/java/htsjdk/variant/vcf/VCFCodec.java | 25.000% <22.222%> (-52.273%) |
:arrow_down: |
| src/main/java/htsjdk/variant/vcf/VCFUtils.java | 45.570% <28.571%> (-3.706%) |
:arrow_down: |
| src/main/java/htsjdk/variant/vcf/VCF3Codec.java | 57.895% <50.000%> (-5.520%) |
:arrow_down: |
| ...n/java/htsjdk/variant/vcf/VCFFormatHeaderLine.java | 66.667% <57.895%> (-3.333%) |
:arrow_down: |
| ...tsjdk/variant/variantcontext/writer/VCFWriter.java | 77.451% <65.000%> (-5.345%) |
:arrow_down: |
| ...main/java/htsjdk/variant/vcf/VCFHeaderVersion.java | 75.000% <66.667%> (+3.125%) |
:arrow_up: |
| ...n/java/htsjdk/variant/vcf/VCFSampleHeaderLine.java | 66.667% <66.667%> (-33.333%) |
:arrow_down: |
| ...main/java/htsjdk/variant/vcf/VCFAltHeaderLine.java | 72.727% <68.421%> (-27.273%) |
:arrow_down: |
| ... and 22 more |
Note: this needs to be resolved/updated to account for the final disposition of structured vs. unstructured lines, attribute ordering, and quoting, as described in https://github.com/samtools/hts-specs/issues/642/ https://github.com/samtools/htsjdk/issues/1610, and also https://github.com/samtools/hts-specs/pull/620.
We've been discussing the lack of support for VCF 4.4 in IGV (https://github.com/igvteam/igv/issues/1289) and started looking at the current htsjdk / VCF interaction. This PR looks like a major step forward to resolve issues around VCF 4.3 - thanks for all your work on this! Any chance of merging this soon so we can consider what changes would be required for 4.4?
@ohofmann Hi Oliver - yes, this PR addresses quite a few issues with VCF support in htsjdk, and is a prerequisite for a couple of other large PRs for VCF/BCF support. The team has been sidetracked for quite a while with upgrading several of these repos to Java 17, which turned out to be a lot more work than anticipated, but we finally got the main PR for that merged last week. We discussed this PR in our team meeting this morning, and we do plan to get it merged sometime in the next quarter. It has had a fair amount of review/discussion already, but its quite large and will require a major version release, and has significant downstream test impact on projects such as GATK (I haven't tried to build IGV with it, but I expect the IGV fallout to be minimal). Hope that help.s
@cmnbroad Thanks for the update! It does help, though we may have to consider postponing the VCF 4.4 announcement / PR drive. I wouldn't want the first experience of many users to be an htsjdk error. Will keep an eye on this PR in the meantime.
@lbergelson I've rebased this on master/Java 17.