hydroflow icon indicating copy to clipboard operation
hydroflow copied to clipboard

Graduate class project ideas

Open jhellerstein opened this issue 7 months ago • 2 comments

  • [ ] Build out the “stdlib” of importable distributed protocol implementations in Hydroflow(+). Some things are simple (heartbeats, retries/acks), some are very complex (Paxos variants, BFT variants, etc). See #346.
  • [ ] Mostly-monotonic (lattice-oriented) variants of standard distributed protocols. For example, @jhellerstein is working on an implementation of retry/ack that uses a mutable “outstanding messages” buffer. It could instead use a tombstone set with efficient tombstone compression a la Neil Conway’s Edelweiss (something @zzlk has on his plate).
  • [ ] Compiler analysis to do (b) automagically!
    • Identify monotonic use of mutable state, auto-convert to use lattice-based state.
    • Identify “pure local” uses of mutable state that will remain local even under replication (after replication to two nodes, each node keeps that state private from the other). Identify such non-monotonicity as “safe for replication”.
      • Example: TCP-style use of sequence numbering on point-to-point channels is non-monotonic because the protocol for message sequencing “peeks” at the increasing counters. Still, it seems OK to replicate otherwise-monotonic programs that use TCP. What properties assure that is safe for replication, and how do we check for those properties in general?
  • [ ] Egglog + Lattices in Hydroflow
    • Can we match egglog performance? What roadblocks exist?
    • Would the use of Hydroflow's UnionFind lattice help in some way (with perf, analysis, incremental view maintenance)?
  • [ ] High-performance Datalog (and Dedalus) over Hydroflow. c.f. Souffle, Egglog and other leading implementations.
  • [ ] Differential/Incremental execution on Hydroflow
    • In the style of DBSP, and then perhaps Differential Dataflow.
    • Do we have the operators and types required to support manually-constructed incremental flows?
    • Can we elegantly translate from “batch” execution in some language (Datalog? Hydroflow+?) to such incremental implementations?
  • [ ] Supporting additional algebraic types in Hydroflow
    • Abelian Groups like the Z-relations used in DBSP
    • Semirings: general-purpose provenance semirings, lattice semirings (with an idempotent +)?
      • Could a semiring for multisets replace and improve existing multiset support in Hydroflow?
    • Integration: what does it mean to have a dataflow library with multiple algebras at play? Composition? Optimization?
    • Architecture: library of example data structures (like the lattice library) and Hydroflow dataflow operators to suit, and optimizer opportunities.
    • Impact on distributed systems folks, e.g. CRDT fans.
  • [ ] Build a distributed transactional storage system in Hydroflow
    • Perhaps starting with @zzlk's Anna implementation?
    • Work with commodity cloud storage
    • Many design points in this space. Is there a "toolkit + optimizer" approach here? Discuss with @tiemobang.
  • [ ] Practical worst-case optimal joins in Hydroflow
    • Have a look at Free Join
    • Many design points here. Is there a compositional "toolkit + optimizer" approach?
    • What does WCOJ mean for streams and other dynamics? Examine relationship of WCOJ with Eddies/STAIRS.

jhellerstein avatar Nov 29 '23 16:11 jhellerstein