hts-specs icon indicating copy to clipboard operation
hts-specs copied to clipboard

BCF encoding of FLAG info fields

Open jmarshall opened this issue 6 years ago • 3 comments

In §6.4.6 and §6.4.7, the specification says that INFO flag fields are encoded as follows:

0x11 0x50 0x11 0x01HM3 flag is present

Where 0x11 0x50 encodes the HM3 field name and 0x11 x01 encodes a 1 value for the flag.

In fact HTSlib / bcftools outputs present INFO flags in BCF as

0x11 0x50 0x00

and always has, essentially ever since the introduction of HTSlib.

The first paragraphs of §6.3.3 say “0,4,6,8-15 reserved for future use”, so this 0x00 is using an atomic type code that is reserved for future use.

However the Misc. notes at the end of §6.3.3 says

A type byte value of 0x00 is an allowed special case meaning MISSING but without an explicit type provided.

It is unclear whether this just applies to genotype fields or to all fields (in which case it is inconsistent with the text at the start of §6.3.3).

HTSlib's outputting of this flag encoding could be considered a bug in HTSlib. However as it has always done this and swayed by the Misc. notes text, it seems likely that this is indeed intended to be a valid encoding of these fields.

To clarify and fix the specification:

  1. Remove the Misc. notes section, instead adding a 0 row to the BCF2 types table at the top of §6.3.3, removing 0 from the reserved list, and moving the Misc. notes text to here (suitably adjusted) to explain the 0 row.

  2. Change the HM3 rows in §6.4.6 and §6.4.7 to reflect reality.

jmarshall avatar Feb 20 '19 11:02 jmarshall

The HM3 examples were updated to reflect this HTSlib / bcftools reality in PR #456. However the text in §6.3.3 (“Type encoding”)'s Flags paragraph still says

The recommended best practice is to encode the value as an 1-element INT8 (type 0x11) with value of 1 to indicate present.

This sentence should also be updated to reflect reality and reflect the HM3 example.

jmarshall avatar Jan 12 '22 01:01 jmarshall

Yes, this was very surprising. I implemented according to spec and bcftools view showed my flags as "=1"...

h-2 avatar Jan 26 '22 11:01 h-2

Presumably the original implementation (in htslib) reconsidered the flag representation in the early days, and the specification (by the same author) was never updated to match. It's very unfortunate, but incrementally improving as these inconsistencies get noticed.

Can you point us to your implementation's code? I guess this is inside SeqAn?

jmarshall avatar Jan 26 '22 12:01 jmarshall