hts-nim icon indicating copy to clipboard operation
hts-nim copied to clipboard

Get list of INFO and FORMAT keys from Header

Open edg1983 opened this issue 4 years ago • 1 comments

Hi Brent,

Is there a way to get the list of keys for INFO and FORMAT definitions from the header of a VCF using hts-nim?

The idea I'm working on is to get all the INFO and FORMAT keys defined in the header from 2 VCFs so I can compute the intersection and output a new VCF containing only the shared fields for both.

Thanks!

edg1983 avatar Jun 17 '21 16:06 edg1983

Hi Edoardo, there's not currently a nice way to do this. You could get the header-string from each header, then write your own code to get the intersection of INFO and FORMAT fields and then use, e.g.:

try:
 var hi = ivcf.header[key]
  # do something with hi (HeaderInfo)
except KeyError:
  continue

and you can merge headers as here: https://github.com/brentp/tnsv/blob/main/tnsv.nim#L38 (just letting htslib do that part). In short, it's possible, but will be a lot of work and string parsing. If you give it a go and get stuck I'll attempt to help.

brentp avatar Jun 18 '21 08:06 brentp