puzzle
puzzle copied to clipboard
Search through all cases
We should think of a solution for how to do searches for all variants in all cases. When using the gemini- and mongo adapters this will be straightforward, we just need to figure out how the interface can handle this.
EDIT: from #213
- [ ] should be able to decide which individuals to include (from multiple cases)
Are we still targeting this for the VCF-adapter? Will tabix make this easier?
We can try to do it for the VCF adapter as well. One of the challenging things is how it should be visualized in puzzle. I have limited experience with these kind of analysis so if anyone else have ideas please post them!
I don't :-/ I'll ping @ohofmann just in case he has an idea of how this should be visualized.
I don't think we'll be looking for all-variants-in-all-cases regularly. All variants at a given location, maybe (which will rarely be larger than a gene), and that's mostly for context. Could do this via ad hoc aggregation (sum of samples with this mutation, with mutations of a certain impact in this gene, etc.).
@Bianca-T might have some ideas/use cases
I think it is very useful to have a function for searching all the cases. Would use it for:
- search for mutations in all cases in a specific gene (great if frequency or impact filters could be added to this query); for example whenever a new gene is published in my field of research I currently use a gemini query to look for all variants (with certain characteristics) in my whole database;
- search for a specific variant/mutation in all cases (this could also help identify recurrent false positive variants);
The best would be to have a search bar that can be used a bit like the ExAC browser for gene level, variant level, transcript level, Rs ID level and even region level for e.g. regulatory elements or similar breakpoints area for SV
So is there any idea how we would present the result of such a query?
I would suggest a summary table with few but relevant info (e.g. the table in the variant page; genomic position, nucleotidic change, gene, effect, etc...), with the addition of the sample that carries the variant and his/her genotype at that specific variant. Even better with a hyperlink to the specific variant page for each case so that further exploring can be done.
Disclaimer: have no idea of harder this implementation can be :smile: