datahike icon indicating copy to clipboard operation
datahike copied to clipboard

Distributed reads and transactor for datahike-server

Open whilo opened this issue 4 years ago • 6 comments
trafficstars

This PR implements distributed read operations by redirecting the access to the connection to the underlying konserve storage to update the DB value (solving #322). It requires one roundtrip to fetch the root under :db and is a polling mechanism that is efficient if the reader is on average not accessing the database more often than it is being written to it and if the latency of the store access is not too high for the application. For low latency the transactor interface now exposes whether it is actively streaming, as is the default local transactor that we had so far (basically directly operating on the connection atom).

This PR also provides an implementation of the transactor interface for datahike-server, which provides the single writer to the underlying store and is not able to stream yet. Provided datahike server is running with the same datahike version and configured with the example configuration, the following configuration works now for an arbitrary number of Datahike peers (readers):

{:store    {:backend :file :path "/tmp/dh-file"}
            :keep-history? true ;; true
            :schema-flexibility :read
            :transactor {:backend :datahike-server
                         :client-config  {:timeout 300
                                          :endpoint "http://localhost:3333"
                                        ;:token "securerandompassword"
                                          :db-name "users"}}
            }

It is not clear to me yet whether we want to keep this transactor in this repository or move it into a separate one because we now have an additional dependency on datahike-client that we can avoid. Probably it is better to pull it out. The benefit of including it in Datahike is that Datahike will then always work with datahike-server instances out of the box.

whilo avatar May 21 '21 04:05 whilo

datahike-client will not work either with datahike when compiled with JDK<11 because I am using the JDK built-in http-client in the datahike-client

TimoKramer avatar May 23 '21 15:05 TimoKramer

datahike-client will not work either with datahike when compiled with JDK<11 because I am using the JDK built-in http-client in the datahike-client

@TimoKramer I've submitted a pull req to make the client work on JDK8

alekcz avatar May 26 '21 14:05 alekcz

@alekcz Does http-kit also work with Graal's native-images? We are currently heading in this direction and I think it will be particularly beneficial if our HTTP client is also embeddable into native images.

whilo avatar May 27 '21 07:05 whilo

babashka supports http-kit so this seems to work. I think it is a good choice because with borkdude supporting it it is obviously very well alive:)

TimoKramer avatar May 27 '21 07:05 TimoKramer

@TimoKramer I think Datahike should be usable without depending on libraries for network IO. Therefore I think it would not be bad to move the code of the actual DatahikeServerTransactor and with it the dependencies on a HTTP client out of this PR into a separate downstream repository that can be included into projects as needed. I would do the same for the native-image cli interface. Nonetheless it could potentially include the DatahikeServerTransactor.

whilo avatar May 27 '21 07:05 whilo

@alekcz Does http-kit also work with Graal's native-images? We are currently heading in this direction and I think it will be particularly beneficial if our HTTP client is also embeddable into native images.

Yes it does. It's actually a built-in library in babashka. now too.

alekcz avatar May 27 '21 07:05 alekcz

@jsmassa @kordano This PR is basically complete and ready for feedback. Tests are fixed and it works with datahike-server.

whilo avatar Dec 26 '22 22:12 whilo

This PR also covers https://github.com/replikativ/datahike/pull/439 btw. You can now pass in tx-metaa noCommit true flag to avoid writing to disk in a transaction. The benefit of this is that it is tracked as data per transaction instead of transactor state that could interleave with other transactions being dispatched or still pending.

whilo avatar Jan 05 '23 10:01 whilo

Ok, I just did a test also for the version checks we have now (to never load a database with older code versions than it was written with) and removed an error I accidentally applied as a wrong fix earlier.

whilo avatar Feb 20 '23 19:02 whilo

Looks well written in total but it lacks testing. With such a large new use case we should have a decent amount of tests covering the new feature.

@kordano I agree, but note that the code does not change the way the connection works for existing use cases and hence is tested for them with the current code base. Accessing a remote connection should be tested with an integration test including datahike-server and datahike-server-transactor. Since we also want to show case that, it would be good for me to get some feedback how to do it. We can then add an integration test in the same way we do it for datahike with konserve. Is that ok? I still think that this can be merged now then and the additional test can be added with the new use case.

whilo avatar Mar 13 '23 18:03 whilo

Yeah, it's fine to add them later on. So go ahead an release it. My other comments where primarily superficial and about formatting.

kordano avatar Mar 13 '23 18:03 kordano