R

library(tidyverse)
library(RJSONIO)
anno <- RJSONIO::fromJSON('/Users/mcgaugheyd/Desktop/sample717.json.gz')

anno_tib <- anno$positions %>% bind_rows() %>% as_tibble()

anno_tib %>% dim()

This returns 152,392 variants.

bash

zcat sample717_SmallVariants.genome.vcf.gz | grep PASS | wc -l

This returns 1,811,351 variants

Hmm. The vcf is full of 0/0 and 1/. Perhaps I should remove those?

bash

zcat sample717__SmallVariants.genome.vcf.gz | grep PASS | grep -v "0/0" | grep -v "0/." | wc -l

This returns 490 variants.

OK, let's look at the json file again.

R

anno_tib %>% filter(filters == 'PASS') %>% dim()

This returns 1288 variants.

Can you give some pointers on how to line the json and vcf up?

Aug 04 '20 14:08 davemcg

Is the partial answer that intergenic variants don't get annotated? When I get around to figuring out how to left_join these together I'll figure this out - but right now I'm not 100% confident the json is holding all the variants.

Aug 04 '20 15:08 davemcg

Hi David,

Nirvana will skip the reference call variants from the genome VCF file. Then reason is mainly to control the size of the JSON output.

Aug 04 '20 21:08 haochenl

Is there some way you recommend pulling out the format fields?

Since you didn't put them into the "outside" of the file (like a vcf would have in the header. You did this for some stuff like the gene info) the only way I can figure out how to extract (for example, the 'cosmic' fields) is to do a kind of computationally expensive extraction where I flatten the entire column and run a grep.

Example:

#R

anno_tib %>% pull(variants) %>% unlist() %>% names() %>% grep('cosmic', ., ignore.case = TRUE, value = TRUE) %>% unique() %>% sort()

Aug 06 '20 19:08 davemcg

@davemcg - did you find a solution for this that worked for you? I've given the two Jupyter Notebooks that are part of the docs a try but didn't really get proper output that I could use downstream.

Jul 11 '22 14:07 apeltzer

Nirvana
Nirvana copied to clipboard

JSON parsing. Am I on crazy pills?

R

bash

bash

R

Nirvana Nirvana copied to clipboard

JSON parsing. Am I on crazy pills?

R

bash

bash

R

Nirvana
Nirvana copied to clipboard