multi-allelic split alleles in GA4GH
In ADAM/Mango a genotype call is defined as a list of GenotypeAllele here:
https://github.com/bigdatagenomics/bdg-formats/blob/master/src/main/resources/avro/bdg.avdl#L966
GenotypeAllele is defined as one of:
https://github.com/bigdatagenomics/bdg-formats/blob/master/src/main/resources/avro/bdg.avdl#L753
In cases where a multi-allelic variant was split (as it is when loading to ADAM) an allele within a genotype can be OTHER_ALT as described here: https://github.com/bigdatagenomics/bdg-formats/blob/master/src/main/resources/avro/bdg.avdl#L766
In the GA4GH schema, a genotype call is defined here: https://ga4gh-schemas.readthedocs.io/en/latest/schemas/variants.proto.html#protobuf.Call and can represent multi-allelic sites.
When report we Variant calls based on ADAM/Mango data in GA4GH API format, I am unsure how to represent the OTHER_ALT.
For now I plan to use "." for OTHER_ALT - which is document to mean missing.
but if anyone has comments on my interpretation or the best way, let me know.
@david4096 may be interested
That's the same approach we use upstream in ADAM, so +1 from me!
Would you post an issue describing this https://github.com/ga4gh/ga4gh-schemas/issues ?
I don't see a problem with adopting the same approach.