Compare mehari annotation with VEP annotation
- build mehari transcript DB (ensembl):
- download cdot for ensembl and grch37 / grch38
- download ensembl FASTA for transcripts
- create / fill seqrepo with ENSEMBL FASTAs
- build mehari database
- install VEP + caches
- caches: grch37 release 105, grch38 release 110
- obtain test dataset, e.g.:
- clinvar VCF
- https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/
- https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/
- genes with few variants: TGDS, KYNU
- genes with lots variants: BRCA1, BRCA2, TTN
- later: extend to include certain regions in gnomAD (via tabix https://)
- regions: BRCA1, BRCA2, TTN, SLC39A14 (ManePlusClinical)
- clinvar VCF
- comparison:
- VEP on ClinVar GRCh37/GRCh38
- local check: build data for e.g. TGDS
- annotate all transcripts with both mehari and vep
I had started https://github.com/varfish-org/annotation-zoo a while ago, but that repo got stalled, since I was busy with DHA stuff. It would be great if we could put repro stuff there unless we want to move towards a more monorepo approach for mehari.
I had started https://github.com/varfish-org/annotation-zoo a while ago, but that repo got stalled, since I was busy with DHA stuff. It would be great if we could put repro stuff there unless we want to move towards a more monorepo approach for mehari.
Let's continue in a central repo, but I'd rather have this called mehari-validation or similar?
We do this in https://github.com/varfish-org/mehari-annotation-comparison but this potentially needs some tidying up and a README