datahike icon indicating copy to clipboard operation
datahike copied to clipboard

Connect to existing filestore created in different process fails

Open jsmassa opened this issue 3 years ago • 8 comments

While reconnecting when called from the same process works, connect when being called from another process fails with

Execution error (ExceptionInfo) at datahike.connector/eval48190$fn (connector.cljc:154).
Database does not exist.

The respective directory exists though.

It previously worked with Datahike "0.3.2".

jsmassa avatar Sep 08 '21 17:09 jsmassa

Which konserve version is this?

whilo avatar Sep 08 '21 18:09 whilo

I used the current Datahike snapshot, which is using hitchhiker-tree 0.1.11, so it should be konserve 0.5.1

jsmassa avatar Sep 08 '21 18:09 jsmassa

Could it be because some lock on the file has not been released? Since shutdown-agents doesn't end the JVM process, at least not in reasonable time, I have to call System/exit in the end of the database creating process.

jsmassa avatar Sep 09 '21 07:09 jsmassa

Can you describe the interleaving of the steps of the two processes a bit more? Concurrent write operations between processes are not yet supported, but are addressed in https://github.com/replikativ/datahike/pull/337.

whilo avatar Sep 16 '21 04:09 whilo

Its happening with docker-compose. I am using one service to write data into a database and another service to read it. The file system database uses a volume directory, so it is being persisted. The second service also sees the directory as well as the ksv files in it, but when I am trying to connect to the database, it claims the database would not exist. The whole thing worked before, so I am assuming its either a change in datahike or in konserve that made it impossible.

The project where this is happening now is datahike-benchmark.

jsmassa avatar Sep 28 '21 09:09 jsmassa

Ok, let me provide some context that might help to trace back this issue. The key to filename translation is happening by (hasch.core/uuid :db) => #uuid "0594e3b6-9635-5c99-8142-412accf3023b" . This means you should see the file that corresponds to the db value under 0594e3b6-9635-5c99-8142-412accf3023b.ksv. In particular you should be able to see what is actually being stored by reading this file with (<!! (konserve.core/get store :db)) at the place in your code where you try to connect and it fails. Since the existence check fails, it seems to be not visible for the other process. The default store mode is to never overwrite files (but first write a new version and then atomically rename) and fsync, so data should never be lost unless your filesystem is not configured correctly or you deactivate the default options. But it is possible that the Docker container is now not fsync'ing in time for some reason. I would like to understand it a bit better though before posing plausible hypotheses.

whilo avatar Oct 01 '21 07:10 whilo

We have to check this again with the new backends.

kordano avatar Nov 18 '21 15:11 kordano

This should be solved with the connection management in #332, although I am not sure what the actual issue was.

whilo avatar Mar 16 '23 01:03 whilo