hyperflow
hyperflow copied to clipboard
Provenance store
- Set up a provenance store (database) suitable for the HyperFlow provenance model
- Implement "flushing" provenance events logged during workflow execution to this store
- The store needs to support common provenance queries (such as lineage graph of a data/signal)
- Probably it should be a more general store for "workflow execution history" which supports not only provenance queries but also others
Provenance logging is implemented in engine2/process.js
By default it is disabled by the flag set in engine2/index.js
:
this.logProvenance = false;
First prototype is ready, provenance data, in form of event nodes (tuples) is stored in neo4j database, with two relationship types:
- events connected by signal (signal index) write/read
- signal writes dependant on other signal reads
TODOs:
- fix lack of optimization, and suspiciously slow writes to db
- add support for simultaneous execution of multiple workflows
- fix lots of callback nesting
- think about special treatment of signal events with initial values (maybe 'virtual write' events?)