htsjdk icon indicating copy to clipboard operation
htsjdk copied to clipboard

VCFHeader and VCFHeaderLine rewrite/refactoring

Open cmnbroad opened this issue 4 years ago • 6 comments

This PR is resurrected from the PR graveyard, updated and rebased. It contains:

  • significant updates to VCFHeader and VCFHeaderLine class hierarchy
  • significantly updated VCFHeader merging
  • prerequisites for writing VCF4.3 and BCF2.2
  • some source-level incompatibilities for VCFHeader and VCFHeaderLine (a detailed document is being prepared)

Fixes numerous issues, including:

  • https://github.com/samtools/htsjdk/issues/250
  • https://github.com/samtools/htsjdk/issues/251
  • https://github.com/samtools/htsjdk/issues/1574
  • https://github.com/samtools/htsjdk/issues/277
  • https://github.com/samtools/htsjdk/issues/500
  • https://github.com/samtools/htsjdk/issues/507
  • https://github.com/samtools/htsjdk/issues/730
  • https://github.com/samtools/htsjdk/issues/575
  • https://github.com/samtools/htsjdk/issues/929
  • https://github.com/samtools/htsjdk/issues/1573
  • https://github.com/samtools/htsjdk/issues/1667

cmnbroad avatar Nov 09 '21 22:11 cmnbroad

Codecov Report

Merging #1581 (3084903) into master (347c0ac) will increase coverage by 0.022%. The diff coverage is 79.088%.

:exclamation: Current head 3084903 differs from pull request most recent head a4c529d. Consider uploading reports for the commit a4c529d to get more accurate results

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@               Coverage Diff               @@
##              master     #1581       +/-   ##
===============================================
+ Coverage     69.856%   69.878%   +0.022%     
- Complexity      9695      9832      +137     
===============================================
  Files            703       706        +3     
  Lines          37772     38022      +250     
  Branches        6139      6155       +16     
===============================================
+ Hits           26386     26569      +183     
- Misses          8929      8965       +36     
- Partials        2457      2488       +31     
Impacted Files Coverage Δ
...in/java/htsjdk/samtools/SAMSequenceDictionary.java 75.000% <ø> (ø)
...c/main/java/htsjdk/variant/vcf/VCFRecordCodec.java 0.000% <0.000%> (ø)
src/main/java/htsjdk/variant/vcf/VCFCodec.java 25.000% <22.222%> (-52.273%) :arrow_down:
src/main/java/htsjdk/variant/vcf/VCFUtils.java 45.570% <28.571%> (-3.706%) :arrow_down:
src/main/java/htsjdk/variant/vcf/VCF3Codec.java 57.895% <50.000%> (-5.520%) :arrow_down:
...n/java/htsjdk/variant/vcf/VCFFormatHeaderLine.java 66.667% <57.895%> (-3.333%) :arrow_down:
...tsjdk/variant/variantcontext/writer/VCFWriter.java 77.451% <65.000%> (-5.345%) :arrow_down:
...main/java/htsjdk/variant/vcf/VCFHeaderVersion.java 75.000% <66.667%> (+3.125%) :arrow_up:
...n/java/htsjdk/variant/vcf/VCFSampleHeaderLine.java 66.667% <66.667%> (-33.333%) :arrow_down:
...main/java/htsjdk/variant/vcf/VCFAltHeaderLine.java 72.727% <68.421%> (-27.273%) :arrow_down:
... and 22 more

... and 69 files with indirect coverage changes

codecov-commenter avatar Nov 09 '21 22:11 codecov-commenter

Note: this needs to be resolved/updated to account for the final disposition of structured vs. unstructured lines, attribute ordering, and quoting, as described in https://github.com/samtools/hts-specs/issues/642/ https://github.com/samtools/htsjdk/issues/1610, and also https://github.com/samtools/hts-specs/pull/620.

cmnbroad avatar Oct 17 '22 13:10 cmnbroad

We've been discussing the lack of support for VCF 4.4 in IGV (https://github.com/igvteam/igv/issues/1289) and started looking at the current htsjdk / VCF interaction. This PR looks like a major step forward to resolve issues around VCF 4.3 - thanks for all your work on this! Any chance of merging this soon so we can consider what changes would be required for 4.4?

ohofmann avatar Mar 02 '23 23:03 ohofmann

@ohofmann Hi Oliver - yes, this PR addresses quite a few issues with VCF support in htsjdk, and is a prerequisite for a couple of other large PRs for VCF/BCF support. The team has been sidetracked for quite a while with upgrading several of these repos to Java 17, which turned out to be a lot more work than anticipated, but we finally got the main PR for that merged last week. We discussed this PR in our team meeting this morning, and we do plan to get it merged sometime in the next quarter. It has had a fair amount of review/discussion already, but its quite large and will require a major version release, and has significant downstream test impact on projects such as GATK (I haven't tried to build IGV with it, but I expect the IGV fallout to be minimal). Hope that help.s

cmnbroad avatar Mar 06 '23 19:03 cmnbroad

@cmnbroad Thanks for the update! It does help, though we may have to consider postponing the VCF 4.4 announcement / PR drive. I wouldn't want the first experience of many users to be an htsjdk error. Will keep an eye on this PR in the meantime.

ohofmann avatar Mar 06 '23 22:03 ohofmann

@lbergelson I've rebased this on master/Java 17.

cmnbroad avatar Mar 14 '23 14:03 cmnbroad