cycledash icon indicating copy to clipboard operation
cycledash copied to clipboard

The genotype_extractor worker needs to be robust

Open ihodes opened this issue 10 years ago • 4 comments

If, for example, the VCF is inserted, but the genotypes cannot be, we have a somewhat inconsistent database. We should at least report errors somewhere more obvious.

ihodes avatar Nov 20 '14 22:11 ihodes

For non-critical constraints, I've seen this done with an integrity check script that is run periodically by a cron job. In other words be optimistic about it working and just look for failures at regular intervals. @danvk probably has better examples from Kansas and whatever other BigTable deployments he worked with at Google.

hammer avatar Nov 20 '14 22:11 hammer

Somehow I managed to work eight years at Google without ever writing data to bigtable or Kansas!

danvk avatar Nov 20 '14 22:11 danvk

This should also be idempotent; running it on the same VCF shouldn't duplicate genotypes cf. https://github.com/hammerlab/cycledash/issues/489

ihodes avatar Jun 02 '15 20:06 ihodes

Related: #494

hammer avatar Jun 04 '15 15:06 hammer