Generalized Upserts
Something that I know @frankmcsherry has also been thinking about.
I consider this issue a sketchpad of potential directions upserts can take:
- The option to convert any input into upsert as long as the key columns are specified.
- The option to ignore row deletions and only pay attention to insert/update records. An example motivation for this is that in the MBTA demo, predictions get updated until the predicted event actually occurs, at which point the predicted event gets deleted. For the sake of being able to do historical analysis, it would be nice to just keep the last update for each event without having to create a different kafka source.
- Frank mentioned the case of instead of determining which update came first by a datetime column instead of by offset.
Also, see comment here, where I propose the option to create an upsert view on top of a non-upsert source (or view): https://github.com/MaterializeInc/materialize/issues/1576#issuecomment-621961931
I wonder if we can/should revisit this as SQL that one could write, rather than a specialized envelope. For example, it seems very close to a TopK followed by a filter on null values, where the TopK is for groups by key and the order is by offset. We already have an append-only implementation of TopK (I believe all reasonably optimized "upsert" sources derive from append-only raw data) and if we made it a bit more opinionated about how it maintained state for LIMIT 1 followed by a filter, it could be that we could replace the specialized envelope with SQL, and open up that implementation for other append-only sources beyond "just Kafka".
Closing as, as far as I can tell, there isn't a specific proposal here!