gatk icon indicating copy to clipboard operation
gatk copied to clipboard

Feature Request - extract Funcotator VCF INFO 'sub-fields'

Open GATKSupportTeam opened this issue 3 years ago • 1 comments

A user on the GATK Forum submitted a request to make the INFO field easier to manipulate through creating a table. At the GATK Office Hours meeting 11/8, we discussed the two ideas and favored the first idea to make a new tool, similar to VariantsToTable, that would unpack the INFO field.

This request was created from a contribution made by Shahryar Alavi on October 30, 2020 19:54 UTC.

Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/360073983291-VariantsToTable-not-extracting-INFO-sub-fields-#community_comment_360013343072

--

But MAF output is somewhat different from VCF; and I think the VCF output format is better for germline variant annotation.

With Funcotator we get an integrated (and minuter) "variant calling - annotation" workflow. But the problem is "vertical bar" separated INFOs are not easy for downstream text processing.

I have two suggestions for the GATK Team:

You may want to develop a new tool (like VariantsToTable) to separate each "sub-info" in the FUNCOTATION INFO, and put them into separate columns with corresponding headers when creating the tab-delimited table.

Or add a feature to Funcotator to create multiple INFOs with FUNCOTATION prefix in their IDs; e.g.

#INFO=<ID=FUNCOTATION\_Gencode\_34\_hugoSymbol,...>

#INFO=<ID=FUNCOTATION\_Gencode\_34\_ncbiBuild,...>

instead of

#INFO=<ID=FUNCOTATION,...,Description="Funcotation fields are: Gencode\_34\_hugoSymbol|Gencode\_34\_ncbiBuild|...">

Thanks

(created from Zendesk ticket #45403)
gz#45403

GATKSupportTeam avatar Nov 09 '21 22:11 GATKSupportTeam