gnomad-browser icon indicating copy to clipboard operation
gnomad-browser copied to clipboard

Update parsing of SV and CNV files

Open ch-kr opened this issue 11 months ago • 2 comments

The v4.0 SV and CNV files currently have annotations in the order <group>_<metric> rather than the format the short variant team uses (<metric>_<group>). For example, one of the annotations in the CNV VCF is afr_SC, but this metric should be SC_afr for consistency with the short variant team.

As part of v4.1, I've asked the Xuefang and Jack to update the SV and CNV files, respectively. More specifically, I have requested the following updates:

  • Update frequency annotation order to be metric followed by group (thread here)
  • Update genetic ancestry group labels
    • Related: update popmax to grpmax / all references of "population" to "genetic ancestry group"
  • Update MALE/FEMALE to XY/XX

Xuefang is also planning to make the following updates:

  • Update GT info for mCNVs
  • Update gene annotation list for consistency with short variant team

Would it be possible to update the code ingesting the SV/CNV data to work with these updates?

ch-kr avatar Mar 11 '24 18:03 ch-kr

Will chime in to say that I will be incorporating explicit count of "remaining" samples this time around. Elissa had made last minute pre-launch changes so that the table added up in absence of me exposing the "remaining" sample carrier counts.

JMF47 avatar Mar 11 '24 18:03 JMF47

@JMF47 @ch-kr It looks like we have some new consequences in the updated file, specifically:

NONCODING_BREAKPOINT
NONCODING_SPAN
PARTIAL_DISPERSED_DUP

...which is no problem per se, but I do need to know how those new ones fit into the ranking of consequences by severity, which for your reference currently is:

LOF
INTRAGENIC_EXON_DUP
PARTIAL_EXON_DUP
COPY_GAIN
TSS_DUP
MSV_EXON_OVERLAP
DUP_PARTIAL
BREAKEND_EXONIC
UTR
PROMOTER
INTRONIC
INV_SPAN
INTERGENIC
NEAREST_TSS

Please let me know where those fit in. For the moment, I'm going to just add them at the end of the list.

phildarnowsky-broad avatar Mar 29 '24 16:03 phildarnowsky-broad

Closed by #1469

rileyhgrant avatar Apr 18 '24 16:04 rileyhgrant