rdf4j icon indicating copy to clipboard operation
rdf4j copied to clipboard

Transactions delete statements when Adding/Removing Graphs

Open daltontc opened this issue 2 years ago • 6 comments

Current Behavior

I have a graph in a repository where it only contains statements where the subject is the same as the graph IRI(occurs in both Memory and Native):

<https://mobi.com/records#someRecord> {
  <https://mobi.com/records#someRecord> <http://purl.org/dc/terms/title> "asdf";
    <http://purl.org/dc/terms/issued> "2022-04-11T13:11:15.855-06:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <http://purl.org/dc/terms/modified> "2022-04-11T13:11:15.88-06:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <http://mobi.com/ontologies/ontology-editor#ontologyIRI> <https://mobi.com/ontologies/Asdf> .
}
  1. Start a transaction
  2. Remove the graph
  3. Load in an updated graph with a change of <https://mobi.com/records#someRecord> <http://mobi.com/ontologies/ontology-editor#ontologyIRI> <https://mobi.com/ontologies/Qwerty>
  4. Do a getStatements with the subject IRI
  5. Load results into a Model This combined with the transaction causes the issue
  6. Commit
  7. Retrieve the graph

Result:

  • Mid transaction getStatement model has both deleted statement and added statement
  • End Graph only contains the statement <https://mobi.com/records#someRecord> <http://mobi.com/ontologies/ontology-editor#ontologyIRI> <https://mobi.com/ontologies/Qwerty>

Expected Behavior

  1. Graph should contain all added statements:
<https://mobi.com/records#someRecord> {
  <https://mobi.com/records#someRecord> <http://purl.org/dc/terms/title> "asdf";
    <http://purl.org/dc/terms/issued> "2022-04-11T13:11:15.855-06:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <http://purl.org/dc/terms/modified> "2022-04-11T13:11:15.88-06:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>;
    <http://mobi.com/ontologies/ontology-editor#ontologyIRI> <https://mobi.com/ontologies/Qwerty> .
}
  1. Retrieving statements from the graph within a transaction shouldn't affect the end state of the graph.

Steps To Reproduce

Clone branch bug/transaction_retrieval https://github.com/daltontc/rdf4jTest/tree/bug/transaction_retrieval and run Main.

public class Main {
    static ValueFactory vf = SimpleValueFactory.getInstance();
    static IRI recordId = vf.createIRI("https://mobi.com/records#someRecord");

    public static void main(String[] args) throws IOException {
        File repoDir = new File("target/datadir/" + UUID.randomUUID());
        NativeStore nativeStore = new NativeStore();
        MemoryStore memoryStore = new MemoryStore();
        Repository repo = new SailRepository(memoryStore);

        InputStream stream = Main.class.getResourceAsStream("/record_def_original.trig");
        try (RepositoryConnection conn = repo.getConnection()) {
            conn.add(stream, RDFFormat.TRIG);
            RepositoryResult<Statement> stmts = conn.getStatements(recordId, null, null);
            Model model = QueryResults.asModel(stmts);
            stmts.close();
            System.out.println(model.size());
        }

        try (RepositoryConnection conn = repo.getConnection()) {
            conn.begin(); // Occurs with any transaction level > NONE
            conn.remove((Resource) null, null, null, recordId);
            // Clear produces the same result
            // conn.clear(recordId);
            conn.add(Rio.parse(Main.class.getResourceAsStream("/record_def_change.trig"), RDFFormat.TRIG));

            // Retrieval by graph provides expected result
            // RepositoryResult<Statement> stmts = conn.getStatements(null, null, null, recordId);
            RepositoryResult<Statement> stmts = conn.getStatements(recordId, null, null);
            Model model = QueryResults.asModel(stmts);
            System.out.println(model.size());
            stmts.close();
            conn.commit(); // Same behavior if moved below last retrieval

            RepositoryResult<Statement> recordGraph = conn.getStatements(null, null, null, recordId);
            Model resultFinal = QueryResults.asModel(recordGraph);
            recordGraph.close();
            System.out.println(resultFinal.size());
        }

        repoDir.delete();
    }
}

Version

3.7.6

Are you interested in contributing a solution yourself?

No response

Anything else?

No response

daltontc avatar Apr 12 '22 15:04 daltontc

What I am noticing is that it is something to do with the interaction of the:

  1. Transaction start
  2. Graph removal
  3. Same graph addition
  4. Queries against graph after addition (causes removal)

In the below case, any of the fields that I query for in the TupleQuery that should exist in the updated graph end up being removed from the repository.

public static void main(String[] args) throws IOException {
        File repoDir = new File("target/datadir/" + UUID.randomUUID());
        NativeStore nativeStore = new NativeStore();
        MemoryStore memoryStore = new MemoryStore();
        Repository repo = new SailRepository(memoryStore);

        InputStream stream = Main.class.getResourceAsStream("/record_def_original.trig");
        try (RepositoryConnection conn = repo.getConnection()) {
            conn.add(stream, RDFFormat.TRIG);
            RepositoryResult<Statement> stmts = conn.getStatements(recordId, null, null);
            Model model = QueryResults.asModel(stmts);
            stmts.close();
            System.out.println(model.size());
        }

        try (RepositoryConnection conn = repo.getConnection()) {
            conn.begin(); // Occurs with any transaction level > NONE
            conn.remove((Resource) null, null, null, recordId);
            // Clear produces the same result
            // conn.clear(recordId);
            conn.add(Rio.parse(Main.class.getResourceAsStream("/record_def_change.trig"), RDFFormat.TRIG));

            
            
            
            
            // ********************************************************************************************************
            // TODO: NEWLY ADDED QUERY
            TupleQuery query = conn.prepareTupleQuery(
                    "PREFIX dct: <http://purl.org/dc/terms/>\n" +
                    "\n" +
                    "SELECT *\n" +
                    "WHERE {\n" +
                    "    ?record dct:issued ?issued;\n" +
                    "            dct:modified ?modified .\n" +
                    "}");
            TupleQueryResult result = query.evaluate();
            if (result.hasNext()) {
                System.out.println(result.next().getBinding("issued"));
            }
            result.close();
            conn.commit(); // Same behavior if moved below last retrieval
            // ********************************************************************************************************

            
            
            
            
            
            
            RepositoryResult<Statement> recordGraph = conn.getStatements(null, null, null, recordId);
            Model resultFinal = QueryResults.asModel(recordGraph);
            recordGraph.close();

            System.out.println(resultFinal.size());
            resultFinal.forEach(System.out::println);
        }

        repoDir.delete();
    }

daltontc avatar Apr 12 '22 20:04 daltontc

I have ran a quick verification and I seem to be able to reproduce the problem. Have you been able to run variants? For example, is the problem something that only occurs when the graph name and the subject IRI are identical?

abrokenjester avatar Apr 13 '22 11:04 abrokenjester

I tested a couple of variants and was only seeing this removal behavior when the graph and subject IRI are identical. It didn't occur for other subject IRIs in the graph. Nor did it occur for statements whose subject or predicate were the graph name and were queried for those subject/predicates.

daltontc avatar Apr 13 '22 13:04 daltontc

@jeenbroekstra does this bug also apply to 4.0.0-M3? Is it something that can/should be fixed before 4.0.0?

hmottestad avatar Apr 14 '22 12:04 hmottestad

Things work as expected if I retrieve all the statements and remove them as I iterate on them.

Change the conn.clear(recordId)/conn.remove((Resource) null, null, null, recordId) to a conn.getStatements(null, null, null, resourceId).forEach(conn::remove);.

daltontc avatar Apr 14 '22 16:04 daltontc

@jeenbroekstra does this bug also apply to 4.0.0-M3? Is it something that can/should be fixed before 4.0.0?

I think so, yes. Though it seems sufficiently like a corner case that it's not necessarily a blocker.

abrokenjester avatar Apr 14 '22 19:04 abrokenjester