gnomad-browser
gnomad-browser copied to clipboard
Update parsing of SV and CNV files
The v4.0 SV and CNV files currently have annotations in the order <group>_<metric>
rather than the format the short variant team uses (<metric>_<group>
). For example, one of the annotations in the CNV VCF is afr_SC
, but this metric should be SC_afr
for consistency with the short variant team.
As part of v4.1, I've asked the Xuefang and Jack to update the SV and CNV files, respectively. More specifically, I have requested the following updates:
- Update frequency annotation order to be metric followed by group (thread here)
- Update genetic ancestry group labels
- Related: update popmax to grpmax / all references of "population" to "genetic ancestry group"
- Update MALE/FEMALE to XY/XX
Xuefang is also planning to make the following updates:
- Update GT info for mCNVs
- Update gene annotation list for consistency with short variant team
Would it be possible to update the code ingesting the SV/CNV data to work with these updates?
Will chime in to say that I will be incorporating explicit count of "remaining" samples this time around. Elissa had made last minute pre-launch changes so that the table added up in absence of me exposing the "remaining" sample carrier counts.
@JMF47 @ch-kr It looks like we have some new consequences in the updated file, specifically:
NONCODING_BREAKPOINT
NONCODING_SPAN
PARTIAL_DISPERSED_DUP
...which is no problem per se, but I do need to know how those new ones fit into the ranking of consequences by severity, which for your reference currently is:
LOF
INTRAGENIC_EXON_DUP
PARTIAL_EXON_DUP
COPY_GAIN
TSS_DUP
MSV_EXON_OVERLAP
DUP_PARTIAL
BREAKEND_EXONIC
UTR
PROMOTER
INTRONIC
INV_SPAN
INTERGENIC
NEAREST_TSS
Please let me know where those fit in. For the moment, I'm going to just add them at the end of the list.
Closed by #1469