Exomiser icon indicating copy to clipboard operation
Exomiser copied to clipboard

exomiser does not output values from ReMM or CADD

Open XGuoHI opened this issue 6 years ago • 6 comments

As in the following variant TSV files, all CADD and REMM values are . :

#CHROM | POS | REF | ALT | QUAL | CADD(>0.483) | POLYPHEN(>0.956|>0.446) | MUTATIONTASTER(>0.94) | SIFT(<0.06) | REMM 11 | 1.13E+08 | C | A | 198.84 | . | 0.063 | . | 1 | . 1 | 89449390 | T | C | 1061 | . | 0.01 | 1 | 1 | . 10 | 95931011 | A | G | 1125.52 | . | 0.004 | . | 1 | .

XGuoHI avatar May 26 '18 00:05 XGuoHI

I have had these CADD3.1 and ReMM datasets downloaded and in the ./data/ folder

XGuoHI avatar May 26 '18 00:05 XGuoHI

I have the same experience. Any progress on this issue?

oleraj avatar Nov 18 '19 19:11 oleraj

The data is in the JSON output file. If you need TSV you can use something like jq to slice the JSON output into TSV if you like. TSV isn't flexible and adding new fields will likely break people's code.

julesjacobsen avatar Nov 19 '19 14:11 julesjacobsen

Thanks, I see the CADD scores in the JSON file but no REMM score. It appears the data files are found, but no annotations are added in the output.

2019-11-26 17:23:16.695  INFO 34682 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening CADD snv data from source: /path/to/exomiser/exomiser-cli-12.1.0/data/1902_hg19/whole_genome_SNVs.tsv.gz
2019-11-26 17:23:16.852  INFO 34682 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening CADD InDel data from source: /path/to/exomiser/exomiser-cli-12.1.0/data/1902_hg19/InDels.tsv.gz
2019-11-26 17:23:16.969  INFO 34682 --- [           main] o.m.e.a.genome.GenomeDataSourceLoader    : Opening REMM data from source: /path/to/exomiser/exomiser-cli-12.1.0/data/1902_hg19/ReMM.v0.3.1.tsv.gz

Any ideas why not REMM would be missing? I downloaded the file from here: https://charite.github.io/software-remm-score.html

Does the file need to be reformatted for Exomiser?

zcat ReMM.v0.3.1.tsv.gz | head -n 5
# ReMM score version 0.3.1
# CHR	POS	PROBABILITY
1	10001	0.0680
1	10002	0.0680
1	10003	0.0710

Thanks

oleraj avatar Nov 26 '19 23:11 oleraj

Have you added the REMM and CADD scores to the pathogenicitySources: ?

Note also that REMM is trained on non-coding variants so if you're analysing exome data you'll not see any scores. The REMM datafile dosn't need reformatting.

julesjacobsen avatar Dec 03 '19 09:12 julesjacobsen

Sorry for the delay, I didn't see this response until now. Yes, I'm analyzing exome data, so that explains it. I do actually see a couple variants now which passed due to ClinVar whitelisting and I see REMM scores for those so it is reading the file properly and your explanation makes sense.

oleraj avatar Jan 27 '20 19:01 oleraj