rdf4j
rdf4j copied to clipboard
NativeStore Transaction Blocking or Error with Different Transaction Levels
Current Behavior
While using IsolationLevels.NONE
for a long continuous bulk uploads, I see that a subsequent concurrent transaction on a separate RepositoryConnection (even just reads) is blocked if an IsolationLevel is supplied while the bulk upload is occurring with the NativeStore.
Separately, if an IsolationLevel is not supplied, an error is thrown from the NativeSailSource#fork
method java.lang.UnsupportedOperationException: This store does not support multiple datasets
.
The MemoryStore is capable of running reads while a bulk upload transaction is open. I suspect the LMDBStore will have the same issue as the NativeStore since the LmdbSailSource
throws the same error in fork()
Expected Behavior
Concurrent reads with the NativeStore when a bulk upload is occurring are successful and not blocked similar to how the MemoryStore functions.
Steps To Reproduce
public static void main(String[] args) throws Exception {
Path path = Paths.get("target/test");
if (Files.exists(path)) {
System.out.println("Deleting old repo");
FileUtils.deleteDirectory(path.toFile());
}
ValueFactory vf = new ValidatingValueFactory();
NativeStore store = new NativeStore();
store.setDataDir(path.toFile());
// MemoryStore store = new MemoryStore();
Repository repo = new SailRepository(store);
try (RepositoryConnection conn = repo.getConnection()) {
System.out.println("Loading in data");
conn.begin(IsolationLevels.NONE);
conn.add(vf.createIRI("urn:test"), RDF.TYPE, OWL.ONTOLOGY, vf.createIRI("urn:graph"));
conn.add(vf.createIRI("urn:test2"), RDF.TYPE, OWL.CLASS, vf.createIRI("urn:graph"));
try (RepositoryConnection conn2 = repo.getConnection()) {
System.out.println("New Transaction");
conn2.getStatements(vf.createIRI("urn:test"), RDF.TYPE, null);
System.out.println("Read data");
}
conn.commit();
System.out.println("Loaded in data");
}
repo.shutDown();
}
This will result in the java.lang.UnsupportedOperationException: This store does not support multiple datasets
error.
Adding in a transaction begin using conn2
will result in the application to hang on that begin()
call
try (RepositoryConnection conn = repo.getConnection()) {
System.out.println("Loading in data");
conn.begin(IsolationLevels.NONE);
conn.add(vf.createIRI("urn:test"), RDF.TYPE, OWL.ONTOLOGY, vf.createIRI("urn:graph"));
conn.add(vf.createIRI("urn:test2"), RDF.TYPE, OWL.CLASS, vf.createIRI("urn:graph"));
try (RepositoryConnection conn2 = repo.getConnection()) {
conn2.begin();
System.out.println("New Transaction");
conn2.getStatements(vf.createIRI("urn:test"), RDF.TYPE, null);
System.out.println("Read data");
conn2.commit();
}
conn.commit();
System.out.println("Loaded in data");
}
repo.shutDown();
Doing either of these operations with the MemoryStore succeeds. So it appears to be a limitation in the NativeStore and LmdbStore
Version
4.2.3
Are you interested in contributing a solution yourself?
Perhaps?
Anything else?
No response
@abrokenjester I see that you originally committed the NativeSailSource
implementation back in 2016.
As someone not super familiar with the inner workings of the stores, would you be able to shed some light for me on what This store does not support multiple datasets
means in this context and if it is still applicable after all these years?
I suppose that the error does not happen if the connections are created in different threads?
The locking mechanism appears to block until the outer transaction with IsolationLevels.NONE
finishes. If I wrap the inner transaction in a CompletableFuture and do a get()
(a blocking call that waits for the result of the thread) on that CompletableFuture
within the outer transaction, then we get stuck in a waiting state.
try (RepositoryConnection conn = repo.getConnection()) {
System.out.println("Loading in data");
conn.begin(IsolationLevels.NONE);
conn.add(vf.createIRI("urn:test"), RDF.TYPE, OWL.ONTOLOGY, vf.createIRI("urn:graph"));
conn.add(vf.createIRI("urn:test2"), RDF.TYPE, OWL.CLASS, vf.createIRI("urn:graph"));
CompletableFuture<Boolean> cf = CompletableFuture.supplyAsync(() -> {
try (RepositoryConnection conn2 = repo.getConnection()) {
conn2.begin();
System.out.println("New Transaction");
conn2.getStatements(vf.createIRI("urn:test"), RDF.TYPE, null);
System.out.println("Read data");
conn2.commit();
}
System.out.println("Loaded in data");
return true;
});
System.out.println(cf.get()); //blocking until CF completes
conn.commit();
}
Is it expected of the transaction isolation logic to block any concurrent READ operations while a transaction of IsolationLevels.NONE
is still active?
I remember looking into performance optimisations for bulk loading the native store and also discovered that the code takes an exclusive lock.
I think the theory is that if the user has a transaction A with READ_COMMITTED and a transaction B with NONE, then partial writes in transaction B should still not be visible in transaction A. The isolation level defines your own view and shouldn't affect anyone else's transaction. The way that the NONE isolation level is implemented makes it very challenging to satisfy the isolation levels of other transactions, which is why it uses locking instead.
Interesting. Aren't any statement writes/deletes with NONE effectively committed? The documentation below makes it a little unclear with the may not
. Why is it that the MemoryStore does not have this same limitation of fully blocking transactions with NONE that the NativeStore does? Is it I/O related?
NONE The lowest isolation level; transactions can see their own changes, but may not be able to roll them back, and no support isolation among transactions is guaranteed. This isolation level is typically used for things like bulk data upload operations.
The other issue mentioned above is that an Exception is thrown when transaction A with NONE is active and another RepositoryConnection action is performed (i.e., getStatements) without starting a transaction.
Caused by: java.lang.UnsupportedOperationException: This store does not support multiple datasets
at org.eclipse.rdf4j.sail.nativerdf.NativeSailStore$NativeSailSource.fork(NativeSailStore.java:324)
at org.eclipse.rdf4j.sail.base.SailSourceConnection.branch(SailSourceConnection.java:961)
at org.eclipse.rdf4j.sail.base.SailSourceConnection.getStatementsInternal(SailSourceConnection.java:401)
at org.eclipse.rdf4j.sail.helpers.AbstractSailConnection.getStatements(AbstractSailConnection.java:358)
at org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.getStatements(SailRepositoryConnection.java:346)
at org.eclipse.rdf4j.repository.RepositoryConnection.getStatements(RepositoryConnection.java:399)
Is this a symptom of fully blocking transactions when a NONE is active?
I think that both the NativeStore and the MemoryStore run the actual commit phase serially with locks or with synchronisation. The NONE isolation level probably just uses this part of the NativeStore.
Flushing to memory is much faster than flushing to disk. So it's not very noticeable in the MemoryStore.
The LmdbStore actually supports one write transaction with multiple concurrent read transactions. But it also uses the SnapshotSailStore and hence probably does not yet leverage this functionality.