cycledash
cycledash copied to clipboard
The genotype_extractor worker needs to be robust
If, for example, the VCF is inserted, but the genotypes cannot be, we have a somewhat inconsistent database. We should at least report errors somewhere more obvious.
For non-critical constraints, I've seen this done with an integrity check script that is run periodically by a cron job. In other words be optimistic about it working and just look for failures at regular intervals. @danvk probably has better examples from Kansas and whatever other BigTable deployments he worked with at Google.
Somehow I managed to work eight years at Google without ever writing data to bigtable or Kansas!
This should also be idempotent; running it on the same VCF shouldn't duplicate genotypes cf. https://github.com/hammerlab/cycledash/issues/489
Related: #494