gemini icon indicating copy to clipboard operation
gemini copied to clipboard

release prep

Open brentp opened this issue 6 years ago • 46 comments

It's time to make a new release this issue will track what needs to be done. Please add any essential (and only essential) updates as comments to this issue:

  • [x] update clinvar
  • [x] update CADD to v1.4
  • [x] update dbsnp
  • [x] update gnomad and controls and non-neuro control AFs (gnomad 2.1)
  • [x] update dgidb url
  • [x] for x-linked recessive exclude sites where the male is het and parents are hom ref. (see #903)
  • [x] #843 (don't report x-linked DN's when one parent is missing).
  • [ ] #902 update gene_summary and gene_detailed

brentp avatar Jan 09 '19 16:01 brentp

presumably issue #903 is included as well as issue #843

jxchong avatar Jan 09 '19 18:01 jxchong

When you update gnomad, will you include the subcohorts (i.e. frequencies from the controls-only and non-neuro cohorts)?

jxchong avatar Jan 09 '19 18:01 jxchong

I updated the list to include those items. thanks for the reminders

brentp avatar Jan 09 '19 22:01 brentp

Taking de novos into account in the comp_het inheritance model would be nice!

pfpjs avatar Jan 10 '19 12:01 pfpjs

I think it would be great to add support for de novos when reporting candidate CHs. The challenge is that one cannot be confident that it causes a true CH without read-backed phasing in the common case of trios. As such, this would have to either be part of the relaxed priority CH searches or a specific mode.

arq5x avatar Jan 10 '19 14:01 arq5x

Yeah, the variant could be on the same allele as the inherited one and thus not be a real comp_het, but perhaps a new lenient mode, or a flag for these specific cases would help. Also, this applies for diploid organisms (human specifically, in my case), for other ploidies, things could get ugly fast.

pfpjs avatar Jan 10 '19 14:01 pfpjs

Agree it'd be nice. I've used this hack to get at the results:

Run comp_hets --max-priority 2 which will give you all potential CH pairs including those that are de_novo + 2nd allele. This yields a lot of false positives, but you can winnow down the results by cross-referencing the gene list with the results of de_novo and looking for overlapping genes.

jxchong avatar Jan 10 '19 17:01 jxchong

@jxchong, thank you for that hint!

pfpjs avatar Jan 10 '19 18:01 pfpjs

I'll see what I can do about comp-het + DN. Though it can't be phased, it could be a special category.

brentp avatar Jan 10 '19 18:01 brentp

@brentp https://github.com/arq5x/gemini/issues/911 and https://github.com/arq5x/gemini/issues/910 are two regressions from the 0.20 version.

bgruening avatar Jan 13 '19 14:01 bgruening

Looks like dgidb-related functionality only requires a URL update to get it working again.

wm75 avatar Jan 13 '19 16:01 wm75

@wm75 can you make a PR or let me know the URL? I'm not familiar with dgidb.

brentp avatar Jan 13 '19 16:01 brentp

Sure, just need to get my laptop to look it up.

wm75 avatar Jan 13 '19 16:01 wm75

Ok, so you'd use:

dgidb_url = 'http://www.dgidb.org/api/v2/interactions.json?genes='

to re-enable things, at least, that's the minimal change.

What I'd prefer though would be if URLs (the dgidb, the install-data url, others?) became part of the config file. This would give, e.g., Galaxy Admins a single place to locate these URLs and update them if they break at some point. I realize that is a slightly bigger change, but if you have a little bit of time before the release, I think it would help a lot with long-term support of any gemini install.

wm75 avatar Jan 13 '19 17:01 wm75

@bgruening we've been talking about this before. The above line is the patch to https://github.com/arq5x/gemini/blob/8a3e7571b15cec31f701161f6efe38bc624be028/gemini/dgidb.py#L28 that you'd have to apply to gemini on usegalaxy.eu to make the actionable mutations and the --dgidb option of the query tool work again.

wm75 avatar Jan 13 '19 17:01 wm75

great. thank you! so the dgidb stuff is getting used? If so, I can keep it as long as it's this simple URL change to fix.

brentp avatar Jan 13 '19 17:01 brentp

@wm75 fixed on our server.

bgruening avatar Jan 13 '19 17:01 bgruening

@bgruening it's working :smile:

@brentp yeah, we have an ongoing project with clinicians, for whom being able to get at gene-drug interactions is a major selling point. So I'd rather like to see more such functionality than less.

wm75 avatar Jan 13 '19 19:01 wm75

When you update gnomad, will you include the subcohorts (i.e. frequencies from the controls-only and non-neuro cohorts)?

@jxchong are these the only additions from gnomad? I'll put a list of what I plan to add for final verification, but an initial set would be helpful.

brentp avatar Jan 15 '19 14:01 brentp

Any possibility of updating CADD to the new v1.4?

pfpjs avatar Jan 15 '19 15:01 pfpjs

I have added the CADD update to the list.

brentp avatar Jan 15 '19 18:01 brentp

All, I have implemented the comp-het + denovo (CHDN) in a specific way and I'm looking for problems I might not have foreseen. Currently, the highest priority is 1. I have made a "good" CHDN candidate have a priority of 1.5 so one can filter with that (I will change the gemini parameter to accept floats).

A good candidate for kid, mom, dad would be:

# het inherited from mom:
A/T, A/T, A/A
# DN
C/T, C/C, C/C

A somewhat arbitrary requirement that I have added is that the DN HET can not occur in any unaffected sample in the family. So, if we add an unaffected sib to the above trio so it's kid, mom, dad, sib and the sib also has the de novo:

# het inherited from mom only in proband:
A/T, A/T, A/A, A/A
# DN in both kids
C/T, C/C, C/C, C/T 

this is not reported as a candidate since an unaffected (sib) shares the DN. This could miss cases where it was a germ-line mosaic, but I think it will also remove a lot of false-positives.

Any thoughts on this approach? I want to avoid extra flags and arguments and just provide a 95% solution here.

brentp avatar Jan 15 '19 19:01 brentp

Another possible addition is gnomad exomes update to 2.1: http://gnomad.broadinstitute.org/downloads

Update -- never mind, just saw you already mentioned gnomad exomes in your first list.

oleraj avatar Jan 15 '19 20:01 oleraj

@oleraj that's part of the gnomad item above. I'm testing that now. The file is much larger so I'm trying to cull it a bit.

brentp avatar Jan 15 '19 20:01 brentp

Will this be a separate genetic model, or will it be part of the existing comp_hets? The approach seems fine to me for now (others should weigh in too).

oleraj avatar Jan 15 '19 20:01 oleraj

it will be part of comp-het and not require any change (other than bumping the max-priority to > 1.5

brentp avatar Jan 15 '19 20:01 brentp

@brentp Approach seems fine to me as well. If you are adding CHDN, it'd be nice to add simple multi-gen support to the de_novo model as requested in #885 (i.e. de novo in generation 1, and passed down as autosomal dominant in child in generation 2) as de novo->AD is more common than CHDN. I think it'd be relatively simple to add without requiring a ton of flags because I think you could simply change the requirements such that in a given family, the variant must be de novo at least once (two unaffected parents who are not carriers and their carrier child) while allowing for affected offspring of that child to be het.

jxchong avatar Jan 15 '19 21:01 jxchong

@jxchong would you have a look at: https://github.com/arq5x/gemini/commit/513f53fe2a2e07c25af6e7dd27db74ac25388339 ? and verify it meets your needs for the non-neuro and controls AFs? I just took those 2 fields from the VCF and add them to the database.

brentp avatar Jan 15 '19 21:01 brentp

@brentp commit 513f53f looks good. Do you think others might have a use for the non-cancer control set too? Perhaps yes, especially when analyzing mosaic variants? @oleraj

jxchong avatar Jan 15 '19 22:01 jxchong

Is anyone against removing ESP? Given it's size relative to newer resources, it would drop a few columns...

brentp avatar Jan 15 '19 23:01 brentp