datascript icon indicating copy to clipboard operation
datascript copied to clipboard

Asynchronous Sources in DataScript

Open wbrown opened this issue 9 years ago • 4 comments

Currently, DataScript queries are synchronous. When a query is executed, each pattern has search called in sequential order against a record that implements datascript.db.ISearch. The search calls are synchronous, as they are currently operations against an in-memory immutable BTSet record. This makes sense for a single-threaded environment such as JavaScript / ClojureScript, and functions in Clojure as well.

This proposes to add the capability for asynchronous data sources to DataScript. This can be done as a branch or fork of DataScript in the interim, but I would like to see this merged into DataScript if possible.

This opens the following possibilities:

  • Leveraging Clojure parallelism; each pattern queried for could be performed separately and asynchronously.
  • LevelDB support in a Node environment; requires callbacks.
  • IndexedDB support in a browser environment; requires callbacks.

In the case of LevelDB and IndexedDB, there are complications when it comes to guaranteeing immutability, but it is a separate subject from asynchronous queries and out of scope of this proposal.

At a high level, we would implement an IAsyncSearch protocol, where the return values are expected to be promises or channnels. The intermediate query functions would check for the implementation of the IAsyncSearch methods on the source object provided. On a call to datascript.core/q the sources would be checked. If any sources were provided that implement IAsyncSearch, we would synchronously block on the call if the platform supports it, otherwise, we would return a promise or channel.

It would be useful to note the requirements, implied or explicit:

  • Asynchronous
  • Minimal impact as possible on existing DataScript code.
  • As little platform-specific code as possible.
  • Synchronous calls should work as expected, with little performance degradation.
  • JavaScript integration, returning a JavaScript standard promise, or accept a callback function.

There are a few ways that asyncronity can be gotten:

  • Native - Clojure promises, and JavaScript promises (ECMAScript 6)
  • core.async channels
  • cljs-promises - ClojureScript promises built on core.async
  • redlobster - ClojureScript promises for Node.
  • promesa - Promises for Clojure and ClojureScript

Native

Clojure already has support for promises via promise and synchronously blocking on dereferencing. ECMAScript 6 also supports promises, but would require the JavaScript environment to support ECMAScript 6.

Implementing native support at this level would require a lot of platform specific code, but would work in a JavaScript-targeted environment.

core.async

core.async is well-supported on both Clojure and ClojureScript, but:

  • Integrating with JavaScript would require a callback function to be provided on query call.
  • While Clojure has the synchronously blocking <!! operator, ClojureScript only has the asynchronous <! operator and requires all channel operations to be wrapped in a go block. This would contaminate the Datascript code with core.async.
  • core.async is primarily for message passing and synchronization.

core.async does have the advantage in that it is supported on earlier JVM and JavaScript platforms.

cljs-promises

cljs-promises is built on core.async and provides a promise facility that addresses one of the concerns above. It still does not solve the issue where it has to be asynchronous all the way up. It is also ClojureScript-specific, and does not satisfy the cross-platform requirement.

redlobster

redlobster is a ClojureScript promise facility with strong ties to NodeJS, but it has been shown to work in browsers as well if one ignores some of the Node-specific functionality. However, the query call itself would need to be asynchronous, returning a promise. It is also ClojureScript-specific.

promesa

promesa provides a cross-platform abstraction layer for both Clojure and ClojureScript.

  • On Clojure, it is built on JDK8 completable futures.
  • On ClojureScript, it is built on bluebird. bluebird is well-accepted in the JavaScript community, so a bluebird promise could be returned if invoked from JavaScript on an asynchronous data source.

A possibly big negative is the requirement for JDK8, transitively imposing a dependency onto DataScript. It however fulfills much of the other requirements, and would be my choice.

wbrown avatar Nov 27 '16 16:11 wbrown

For everyone's benefit, the previous issue on this topic is here: https://github.com/tonsky/datascript/issues/22

refset avatar Nov 29 '16 18:11 refset

Just to note, I am interested in this as well and have had a look into it several times in the last years. While I use DataScript and would love to see a durable index, my primary focus is on building a distributed data management system with replikativ which can also be used for Dat* replication. As I have focused on similar problems to have cljs compatible code, I have decided for core.async and defined IO protocols, e.g. for storage with konserve on top of supervised async.

There was quite a bit of boring yak-shaving involved, but I would suggest to break the problems down and build a set of robust cross-platform abstractions to build things like a durable DataScript. In general cross-platform cljs does not have a lot of composable building blocks yet and the asynchronous nature of JavaScript sadly requires non-blocking interfaces on the JVM as well, which barely any Clojure library considers. While promises and other async solutions have in part the benefit of not transforming all your code with core.async, the facilities provided by having very concise async code should be seriously considered. core.async is also fairly solid now. For concrete implementations of limited asynchronous functionality, callbacks are generally prefered for libraries, because they do not push core.async on the user. But once we talk about exposed protocols and interfaces, I would argue that concise blocking semantics allow better composition with less glue code, this is why core.async is considered to be the async library for the Clojure ecosystem. For example error-handling is not easily handled well in an async setting in cljs; or to model rendevouz points (where the sender only is unblocked once the reader consumes the value) between different async libraries is also non-trivial.

In the direction of building blocks I am at the moment primarily interested in having a persistent durable index data-structure (only a first experiment) which is implemented against such portable protocols (e.g. konserve). This would help me to have nearly optimal delta compression for CRDT metadata in replikativ and in general would allow to build fairly sophisticated snapshottable IO infrastructure including an async query engine on Datom indices for DataScript. In particular I am having a look at the hitchiker tree atm. to port it on konserve (instead of redis). Do you think some common infrastructure undertakings like this are reasonable or do you see obstacles?

whilo avatar Jan 20 '17 19:01 whilo

@whilo just shouting some encouragement - would love to see a strong story for distributed state handling based in DataScript :)

theronic avatar Mar 28 '18 11:03 theronic

@theronic how do you envision write coordination? like in datomic with a single transactor?

whilo avatar Mar 28 '18 17:03 whilo