virtuoso-opensource icon indicating copy to clipboard operation
virtuoso-opensource copied to clipboard

Triple with string value cannot be deleted

Open idimopoulos opened this issue 2 years ago • 5 comments

We are having a PHP application that is heavily using a dockerized version of Virtuoso and is working fine (A complete Drupal website storing and retrieving data constantly from the endpoint). We have a problem which is VERY weird and we have come across it only now after 5 years of using the endpoint.

We are using a simple query to test the write permission, mainly we write a simple triple and delete it. We are using the following:

 WITH <http://example.com/id/rdf_entity_21092>
 INSERT { <http://example.com/id/rdf_entity_21092> <http://example.com/predicate> "test value" }
 CLEAR GRAPH <http://example.com/id/rdf_entity_21092>

The problem is that it works fine most of the times but there was this one case where we cannot really query or delete the triple.

The following is a list of queries we ran:

SQL> sparql select * where { ?s ?p "test value" };
s                                                                                 p
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://example.com/id/rdf_entity_21092                                            http://example.com/predicate

The query searches directly for the triples with the "test value" and returns results.

1 Rows. -- 3 msec.
SQL> sparql delete where { graph ?g { ?s ?p "test value" } };

Done. -- 4 msec.

The query seems to be able to delete the quad (using the web version, we get "Successfully deleted 1 triple".

SQL> sparql select * where { ?s ?p "test value" };
s                                                                                 p
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://example.com/id/rdf_entity_21092                                            http://example.com/predicate

1 Rows. -- 0 msec.

We still get results.

SQL> sparql delete where { graph ?g { <http://example.com/id/rdf_entity_21092> ?p ?o } };

Done. -- 1 msec.

Trying to delete using the subject

SQL> sparql select * where { ?s ?p "test value" };
s                                                                                 p
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________

http://example.com/id/rdf_entity_21092                                            http://example.com/predicate

1 Rows. -- 0 msec.

Still getting results.

And the weirdest part is that the following query bring 0 results even though the subject is correct.

SQL> sparql select * where { <http://example.com/id/rdf_entity_21092> ?p ?o };
p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________


0 Rows. -- 1 msec.

idimopoulos avatar Nov 30 '23 11:11 idimopoulos

Oh, forgot, I am using Version: 07.20.3229 Build: Aug 21 2019 (000000)

idimopoulos avatar Nov 30 '23 11:11 idimopoulos

The problem does not occur with the latest OpenLink Virtuoso Open Source Docker Container, thus I would recommend upgrading your installation:

SQL> sparql WITH <http://example.com/id/rdf_entity_21092>  INSERT { <http://example.com/id/rdf_entity_21092> <http://example.com/predicate> "test value" }  CLEAR GRAPH <http://example.com/id/rdf_entity_21092>;

Done. -- 4 msec.
SQL> sparql select * where { ?s ?p "test value" };
s                                                                                 p
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________


0 Rows. -- 3 msec.
SQL> sparql select * where { <http://example.com/id/rdf_entity_21092> ?p ?o };
p                                                                                 o
LONG VARCHAR                                                                      LONG VARCHAR
_______________________________________________________________________________


0 Rows. -- 6 msec.
SQL> status();
REPORT
VARCHAR
_______________________________________________________________________________

OpenLink Virtuoso  Server
Version 07.20.3237-pthreads for Linux as of Jun  7 2023 (f3d88f16b)
Started on: 2023-11-30 13:28 GMT+0

HughWilliams avatar Nov 30 '23 13:11 HughWilliams

@idimopoulos

Please check the DB & RDF store consistency

  • Check database execute backup '/dev/null' via command-line isql tool

Does it report in server log errors?

  • Check RDF store with following queries:
select count (*) from RDF_QUAD a table option (index RDF_QUAD) 
    where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_POGS) 
        where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);

select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS) 
    where not exists (select 1 from RDF_QUAD b table option (loop, index primary key) 
        where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);

select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS) 
    where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_OP) 
        where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);

select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS) 
    where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_SP) 
        where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);

select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS) 
    where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_GS) 
        where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);

See if any of them return non-zero count?

--

imitko avatar Dec 01 '23 09:12 imitko

@HughWilliams I will try but I will be off next week so please, give me some time before closing the issue as inactive for too long.

@imitko I must say I don't know what the queries are doing. The backup command completed without issues, the rest returned 1 for 4 out of 5 queries.

SQL> backup '/dev/null';
Connected to OpenLink Virtuoso
Driver: 07.20.3229 OpenLink Virtuoso ODBC Driver

Done. -- 1994 msec.
SQL> select count (*) from RDF_QUAD a table option (index RDF_QUAD)
Type the rest of statement, end with a semicolon (;)>     where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_POGS)
Type the rest of statement, end with a semicolon (;)>         where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
count
INTEGER
_______________________________________________________________________________

0

1 Rows. -- 166 msec.
SQL> select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS)
Type the rest of statement, end with a semicolon (;)>     where not exists (select 1 from RDF_QUAD b table option (loop, index primary key)
Type the rest of statement, end with a semicolon (;)>         where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
count
INTEGER
_______________________________________________________________________________

1

1 Rows. -- 120 msec.
SQL> select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS)
Type the rest of statement, end with a semicolon (;)>     where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_OP)
Type the rest of statement, end with a semicolon (;)>         where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
count
INTEGER
_______________________________________________________________________________

1

1 Rows. -- 246 msec.
SQL> select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS)
Type the rest of statement, end with a semicolon (;)>     where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_SP)
Type the rest of statement, end with a semicolon (;)>         where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
count
INTEGER
_______________________________________________________________________________

1

1 Rows. -- 182 msec.
SQL> select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS)
Type the rest of statement, end with a semicolon (;)>     where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_SP)
Type the rest of statement, end with a semicolon (;)>         where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
count
INTEGER
_______________________________________________________________________________

1

1 Rows. -- 177 msec.
SQL> select count (*) from RDF_QUAD a table option (index RDF_QUAD_POGS)
Type the rest of statement, end with a semicolon (;)>     where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_GS)
Type the rest of statement, end with a semicolon (;)>         where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
count
INTEGER
_______________________________________________________________________________

1

1 Rows. -- 164 msec.
SQL>

idimopoulos avatar Dec 01 '23 14:12 idimopoulos

@idimopoulos

It seem you have one quad row out of sync for some reason. You can try to fix it with the steps below. Please note the size of database does matter, and this process may take a long time on huge databases.

  1. Create online backup (or make a backup another way, as in the docs):
    SQL> backup_online('db-backup-', 10000000);
    
  2. Then repair the missing quad row using the command-line isql tool:
    create table rq_recov (g iri_id_8, s iri_id_8, p iri_id_8, o any, primary key (g,s,p,o));
    
    log_enable (3,1);
    insert soft rq_recov (g,s,p,o) select g,s,p,o from RDF_QUAD a table option (index RDF_QUAD) 
        where not exists (select 1 from RDF_QUAD b table option (loop, index RDF_QUAD_POGS) 
            where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
    insert soft rq_recov (g,s,p,o) select g,s,p,o from RDF_QUAD a table option (index RDF_QUAD_POGS) 
        where not exists (select 1 from RDF_QUAD b table option (loop, index primary key) 
            where a.G = b.G and a.P = b.P and a.O = b.O and a.S = b.S);
    insert soft rdf_quad index rdf_quad (g,s,p,o)  select g,s,p,o from rq_recov;
    insert soft rdf_quad index rdf_quad_pogs (g,s,p,o)  select g,s,p,o from rq_recov;
    insert soft rdf_quad index rdf_quad_sp (g,s,p,o)  select g,s,p,o from rq_recov;
    insert soft rdf_quad index rdf_quad_op (g,s,p,o)  select g,s,p,o from rq_recov;
    insert soft rdf_quad index rdf_quad_gs (g,s,p,o)  select g,s,p,o from rq_recov;
    log_enable (1,1);
    checkpoint;
    
    The above would copy the missing quad row from one index in the temp table rq_recov, then inserts would try put it back in place.
  3. Next, you can test the delete query and see if the issue still persists. If the issue is resolved, you can drop table rq_recov.
  4. Let us know your results.

imitko avatar Dec 01 '23 14:12 imitko

Wow, I am not nearly close to a good virtuoso dev and I could hardly understand this too. Sorry for being THAT late but there were so many priorities in the project. The above worked indeed. Thank you very much, I will communicate it to our devops. Closing the issue.

idimopoulos avatar Feb 20 '24 22:02 idimopoulos