datahike
datahike copied to clipboard
Consider running transaction function calls in parallel
Currently, multiple :db.fn/call's are executed sequentially. Since these functions are mandated to be pure, they need not block each other. They should be able to run in parallel. Their results can be collected sequentially.
https://github.com/replikativ/datahike/blob/9e4e619d5f06feedacc9d31ad13da88ac2f409d2/src/datahike/db.cljc#L1468-L1473
I think this would require a breaking change because the current behaviour, inherited from DataScript, evaluates transaction functions using intermediate "speculative" db values based on the previous operations applied within a transaction. It would therefore be problematic to decide whether a list of transaction function operations could be safely "expanded" (i.e. evaluated) in parallel or not, as successive assertions and queries (within the functions) could easily overlap and influence each other.
By contrast, Datomic transaction functions are only passed the db value from the start of the transaction, regardless of other operations that may appear before a given transaction function invocation operation, and this is ~trivial to parallelize - that is my understanding anyway, based on:
The transaction processor will lookup the function in its :db/fn attribute, and then invoke it, passing the value of the db (currently, as of the beginning of the transaction) https://docs.datomic.com/on-prem/reference/database-functions.html#processing-transaction-functions
I have never confirmed the Datomic behaviour first-hand though, perhaps someone can check / correct me :)
FWIW Crux also implements the same speculative/serial behaviour as DataScript, since https://github.com/juxt/crux/pull/933
@refset @w9 I would also stick to serializability as the default (same as DataScript) and only opt-out with the user's permission via fine-grained concurrency/consistency controls. Having said this we could introduce a new :db.fn/commutative-call, @w9 would something like this work for you? We could then filter those out and apply them in parallel on the initial value of the DB (like Datomic).