hts-specs
hts-specs copied to clipboard
Formally define GA4GH and MD5 checksums in VCF v4.4
As discussed during GA4GH Connect 2021 calls:
@d-cameron
Just checked VCFv4.3: MD5 isn't even formally defined as a ##contig header field. We should add both ga4gh & md5 into the specs for 4.4
md5 is shown in the example VCFs included in the specs, and the specs mention MD5 in S1.4.7, but it's not actually formally defined (e.g. header name, how the md5 hash is string encoded, etc
@jkbonfield
Agreed, it needs a tight definition - uppercasing, white space removal, what to do about out of range chars, etc.
Hope we can refer to refget's spec as the reference for that final point @jkbonfield and if refget's spec is insufficient we will update accordingly
Thank you for starting the discussion, @tskir - very much in support of this and would love to see this integrated with the refget services.
I'm hoping the definition of what's valid and how to deal with invalid chars is compatible between CRAM (edit: actually SAM) and RefGet (I'm sure they must be given the ancestry), which will therefore serve as the logical starting point for VCF.
Down stream, I think it's maybe worth trying to enforce more for VCF 4.4. Right now, not only does it not require any checksum or assembly information, even the contig lines themselves are purely optional! I can see it may help for rapid hacking around, but we're past those days and should really focus on data provenance for future spec versions.
A way to refer to a reference collection in 1 line instead of needing thousands of contig lines would make requiring it a much easier pill for most users to swallow.
Isn't that almost have that specc.ed out in refget?
On Thu, Mar 4, 2021 at 3:45 PM Louis Bergelson [email protected] wrote:
A way to refer to a reference collection in 1 line instead of needing thousands of contig lines would make requiring it a much easier pill for most users to swallow.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/samtools/hts-specs/issues/551#issuecomment-790929000, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAU6JUR2R5ASCHIWZFY3OYTTB7WOJANCNFSM4YNQPTOA .
It's getting there :) @yfarjoun