jena icon indicating copy to clipboard operation
jena copied to clipboard

Server exit during TDB2 compaction leaves partial storage directory

Open afs opened this issue 1 year ago • 11 comments

Version

4.x, 5.x

What happened?

Report : users@ 2024-02-06

?deleteOld=true is possibly unrelated.

Server exit during the building phase of the new, compacted database (work done by CopyDSG.copy called from DatabaseOps.compact) needs to have some means that a restarted server can clear up.

There does not seem to be any persistent state error handling/recovery.

It is unclear if running on MS Windows (the JDK issue with memory mapped files) is a factor.

Relevant output and stacktrace

No response

Are you interested in making a pull request?

Yes

afs avatar Feb 07 '24 20:02 afs

Hi, thanks for taking a look at this!

We're running Fuseki in a Docker container (based on eclipse-temurin:17-jre-focal which in turn is based on Ubuntu). The host is running Ubuntu 22.04.3 LTS.

Here are the relevant logs:

12:55:12 INFO Server :: Apache Jena Fuseki 4.10.0
12:55:12 INFO Config :: FUSEKI_HOME=/jena-fuseki
12:55:12 INFO Config :: FUSEKI_BASE=/fuseki
12:55:12 INFO Config :: Shiro file: file:///fuseki/shiro.ini
12:55:13 INFO Config :: Load configuration: file:///fuseki/configuration/dsp-repo.ttl
12:55:13 INFO Server :: Configuration file: /fuseki/config.ttl
12:55:13 INFO Server :: Path = /dsp-repo
12:55:13 INFO Server :: Memory: 1.0 GiB
12:55:13 INFO Server :: Java: 17.0.10
12:55:13 INFO Server :: OS: Linux 5.15.0-91-generic amd64
12:55:13 INFO Server :: PID: 7
12:55:13 INFO Server :: Started 2024/02/08 12:55:13 UTC on port 3030
<some queries>
12:57:35 INFO Admin :: [47] Compact dataset /dsp-repo
12:57:35 INFO Server :: Task : 1 : Compact
12:57:35 INFO Server :: [Task 1] starts : Compact
12:57:35 INFO Compact :: [47] >>>> Start compact /dsp-repo
12:57:35 INFO Admin :: [48] Task 1
12:57:45 INFO Admin :: [49] Task 1
12:57:55 INFO Admin :: [51] Task 1
12:58:05 INFO Admin :: [54] Task 1
12:58:15 INFO Admin :: [55] Task 1
12:58:25 INFO Admin :: [57] Task 1
<stopped fuseki>
12:58:46 INFO Server :: Apache Jena Fuseki 4.10.0
12:58:46 INFO Config :: FUSEKI_HOME=/jena-fuseki
12:58:46 INFO Config :: FUSEKI_BASE=/fuseki
12:58:46 INFO Config :: Shiro file: file:///fuseki/shiro.ini
12:58:46 INFO Config :: Load configuration: file:///fuseki/configuration/dsp-repo.ttl
12:58:47 INFO Server :: Configuration file: /fuseki/config.ttl
12:58:47 INFO Server :: Path = /dsp-repo
12:58:47 INFO Server :: Memory: 1.0 GiB
12:58:47 INFO Server :: Java: 17.0.10
12:58:47 INFO Server :: OS: Linux 5.15.0-91-generic amd64
12:58:47 INFO Server :: PID: 7
12:58:47 INFO Server :: Started 2024/02/08 12:58:47 UTC on port 3030

There are no errors in the logs and Fuseki starts up just fine after it was stopped during compaction, it's just missing most of the original data.

And these are the files in Data-0001 (original) and Data-0002 (partially compacted):

root@dd2493b7258a:/fuseki/databases/dsp-repo# ls -la Data-0001
total 8408748
drwxr-xr-x 2 root root       4096 Feb  8 04:01 .
drwxr-xr-x 4 root root       4096 Feb  8 13:57 ..
-rw-r--r-- 1 root root         24 Feb  8 04:10 GOSP.bpt
-rw-r--r-- 1 root root 1317011456 Feb  8 04:10 GOSP.dat
-rw-r--r-- 1 root root   16777216 Feb  8 04:10 GOSP.idn
-rw-r--r-- 1 root root         24 Feb  8 04:10 GPOS.bpt
-rw-r--r-- 1 root root 1342177280 Feb  8 04:10 GPOS.dat
-rw-r--r-- 1 root root   16777216 Feb  8 04:10 GPOS.idn
-rw-r--r-- 1 root root         24 Feb  8 04:10 GPU.bpt
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 GPU.dat
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 GPU.idn
-rw-r--r-- 1 root root         24 Feb  8 04:10 GSPO.bpt
-rw-r--r-- 1 root root 1375731712 Feb  8 04:10 GSPO.dat
-rw-r--r-- 1 root root   16777216 Feb  8 04:10 GSPO.idn
-rw-r--r-- 1 root root          0 Feb  8 04:10 journal.jrnl
-rw-r--r-- 1 root root         24 Feb  8 04:10 nodes.bpt
-rw-r--r-- 1 root root  176160768 Feb  8 04:10 nodes.dat
-rw-r--r-- 1 root root         16 Feb  8 04:10 nodes-data.bdf
-rw-r--r-- 1 root root  308827588 Feb  8 04:10 nodes-data.obj
-rw-r--r-- 1 root root    8388608 Feb  8 04:10 nodes.idn
-rw-r--r-- 1 root root         24 Feb  8 04:10 OSP.bpt
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 OSP.dat
-rw-r--r-- 1 root root         24 Feb  8 04:10 OSPG.bpt
-rw-r--r-- 1 root root 1317011456 Feb  8 04:10 OSPG.dat
-rw-r--r-- 1 root root   16777216 Feb  8 04:10 OSPG.idn
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 OSP.idn
-rw-r--r-- 1 root root         24 Feb  8 04:10 POS.bpt
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 POS.dat
-rw-r--r-- 1 root root         24 Feb  8 04:10 POSG.bpt
-rw-r--r-- 1 root root 1350565888 Feb  8 04:10 POSG.dat
-rw-r--r-- 1 root root   16777216 Feb  8 04:10 POSG.idn
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 POS.idn
-rw-r--r-- 1 root root         24 Feb  8 04:10 prefixes.bpt
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 prefixes.dat
-rw-r--r-- 1 root root         16 Feb  8 04:10 prefixes-data.bdf
-rw-r--r-- 1 root root          0 Feb  8 04:01 prefixes-data.obj
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 prefixes.idn
-rw-r--r-- 1 root root         24 Feb  8 04:10 SPO.bpt
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 SPO.dat
-rw-r--r-- 1 root root         24 Feb  8 04:10 SPOG.bpt
-rw-r--r-- 1 root root 1375731712 Feb  8 04:10 SPOG.dat
-rw-r--r-- 1 root root   16777216 Feb  8 04:10 SPOG.idn
-rw-r--r-- 1 root root    8388608 Feb  8 04:01 SPO.idn
-rw-r--r-- 1 root root          2 Feb  8 13:55 tdb.lock
root@dd2493b7258a:/fuseki/databases/dsp-repo# ls -la Data-0002
total 982360
drwxr-xr-x 2 root root      4096 Feb  8 13:57 .
drwxr-xr-x 4 root root      4096 Feb  8 13:57 ..
-rw-r--r-- 1 root root        24 Feb  8 13:59 GOSP.bpt
-rw-r--r-- 1 root root 150994944 Feb  8 13:59 GOSP.dat
-rw-r--r-- 1 root root   8388608 Feb  8 13:59 GOSP.idn
-rw-r--r-- 1 root root        24 Feb  8 13:59 GPOS.bpt
-rw-r--r-- 1 root root 159383552 Feb  8 13:59 GPOS.dat
-rw-r--r-- 1 root root   8388608 Feb  8 13:59 GPOS.idn
-rw-r--r-- 1 root root        24 Feb  8 13:59 GPU.bpt
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 GPU.dat
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 GPU.idn
-rw-r--r-- 1 root root        24 Feb  8 13:59 GSPO.bpt
-rw-r--r-- 1 root root 167772160 Feb  8 13:59 GSPO.dat
-rw-r--r-- 1 root root   8388608 Feb  8 13:59 GSPO.idn
-rw-r--r-- 1 root root         0 Feb  8 13:59 journal.jrnl
-rw-r--r-- 1 root root        24 Feb  8 13:59 nodes.bpt
-rw-r--r-- 1 root root  25165824 Feb  8 13:59 nodes.dat
-rw-r--r-- 1 root root        16 Feb  8 13:59 nodes-data.bdf
-rw-r--r-- 1 root root  44722477 Feb  8 13:59 nodes-data.obj
-rw-r--r-- 1 root root   8388608 Feb  8 13:59 nodes.idn
-rw-r--r-- 1 root root        24 Feb  8 13:59 OSP.bpt
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 OSP.dat
-rw-r--r-- 1 root root        24 Feb  8 13:59 OSPG.bpt
-rw-r--r-- 1 root root 150994944 Feb  8 13:59 OSPG.dat
-rw-r--r-- 1 root root   8388608 Feb  8 13:59 OSPG.idn
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 OSP.idn
-rw-r--r-- 1 root root        24 Feb  8 13:59 POS.bpt
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 POS.dat
-rw-r--r-- 1 root root        24 Feb  8 13:59 POSG.bpt
-rw-r--r-- 1 root root 159383552 Feb  8 13:59 POSG.dat
-rw-r--r-- 1 root root   8388608 Feb  8 13:59 POSG.idn
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 POS.idn
-rw-r--r-- 1 root root        24 Feb  8 13:59 prefixes.bpt
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 prefixes.dat
-rw-r--r-- 1 root root        16 Feb  8 13:59 prefixes-data.bdf
-rw-r--r-- 1 root root         0 Feb  8 13:57 prefixes-data.obj
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 prefixes.idn
-rw-r--r-- 1 root root        24 Feb  8 13:59 SPO.bpt
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 SPO.dat
-rw-r--r-- 1 root root        24 Feb  8 13:59 SPOG.bpt
-rw-r--r-- 1 root root 167772160 Feb  8 13:59 SPOG.dat
-rw-r--r-- 1 root root   8388608 Feb  8 13:59 SPOG.idn
-rw-r--r-- 1 root root   8388608 Feb  8 13:57 SPO.idn
-rw-r--r-- 1 root root         2 Feb  8 13:58 tdb.lock
root@dd2493b7258a:/fuseki/databases/dsp-repo# 

I should note that the data in Data-0001 is already mostly compacted as it was restored from a previous dump, so I'd expect Data-0002 to end up having about the same size.

Config:

@prefix :           <http://base/#> .
@prefix fuseki:     <http://jena.apache.org/fuseki#> .
@prefix ja:         <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix knora-base: <http://www.knora.org/ontology/knora-base#> .
@prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:       <http://www.w3.org/2000/01/rdf-schema#> .
@prefix root:       <http://127.0.0.1:8888/$/datasets#> .
@prefix tdb2:       <http://jena.apache.org/2016/tdb#> .
@prefix text:       <http://jena.apache.org/text#> .

:indexLucene  rdf:type  text:TextIndexLucene;
        text:analyzer   [ rdf:type        text:ConfigurableAnalyzer;
                          text:filters    ( text:ASCIIFoldingFilter text:LowerCaseFilter );
                          text:tokenizer  text:WhitespaceTokenizer
                        ];
        text:directory  "/fuseki/lucene/dsp-repo";
        text:entityMap  :entMap .

[ rdf:type         fuseki:Server;
  fuseki:services  :service_tdb_all;
  ja:loadClass     "org.apache.jena.query.text.TextQuery"
] .

:tdb_dataset_readwrite
        rdf:type                tdb2:DatasetTDB2;
        tdb2:location           "/fuseki/databases/dsp-repo";
        tdb2:unionDefaultGraph  true .

:service_tdb_all  rdf:type            fuseki:Service;
        rdfs:label                    "TDB2 dsp-repo";
        fuseki:dataset                :text_dataset;
        fuseki:name                   "dsp-repo";
        fuseki:serviceQuery           "sparql" , "query";
        fuseki:serviceReadGraphStore  "get";
        fuseki:serviceReadWriteGraphStore
                "data";
        fuseki:serviceUpdate          "update";
        fuseki:serviceUpload          "upload" .

:text_dataset  rdf:type  text:TextDataset;
        text:dataset  :tdb_dataset_readwrite;
        text:index    :indexLucene .

:entMap  rdf:type          text:EntityMap;
        text:defaultField  "text";
        text:entityField   "uri";
        text:map           ( [ text:field      "text";
                               text:predicate  rdfs:label
                             ]
                             [ text:field      "text";
                               text:predicate  knora-base:valueHasString
                             ]
                             [ text:field      "text";
                               text:predicate  knora-base:valueHasComment
                             ]
                           );
        text:uidField      "uid" .

SamuelBoerlin avatar Feb 08 '24 14:02 SamuelBoerlin

A simple idea for improving this situation:

  • in the process of compaction, write the dataset into a directory with a temporary name that won't be recognized during Fuseki's startup (e.g. Data-0002-tmp instead of Data-0002)
  • when compaction is finished, rename the directory

Assuming the original (Data-0002-tmp) and the target (Data-0002) directories are on the same file system the rename/move operation could* be atomic.

*This is platform-dependent, see for instance File.renameTo().

flange-ipb avatar Feb 08 '24 14:02 flange-ipb

That's the common way to do it - the other is to write a file with the current active storage area at the start of compaction, and append the intended new location at the end of the work, move it, then then delete the file. A transaction log in other words.

Can a directory with memory-mapped files in it be renamed on MS Windows? Or does this run into the problems around https://bugs.openjdk.org/browse/JDK-4715154?

The related Files.move

has more details and seems to say "OK" """ When invoked to move a directory that is not empty then the directory is moved if it does not require moving the entries in the directory. """

afs avatar Feb 08 '24 20:02 afs

Hi again, do you perhaps have a rough timeframe when or if this issue might be addressed? Thanks!

SamuelBoerlin avatar Feb 26 '24 09:02 SamuelBoerlin

Hi @SamuelBoerlin - This issue is something that needs fixing.

Contributors to the Jena project are volunteers, often working around other commitments. The project itself does not commit to timescales because it has no resources under its control.

The use of the text index is not a factor.

afs avatar Feb 27 '24 15:02 afs

Hi @afs,

I am a colleague of @SamuelBoerlin and have a decade of Java experience and I'd like to offer to contribute and fix this problem.

However, since I am unfamiliar with the code base it would take a substantial amount of time to understand the underlying problem and find a suitable solution. I would be grateful for any support or introduction.

Do you have an idea/hint where I could start looking into?

seakayone avatar Feb 28 '24 06:02 seakayone

@seakayone Thanks for the interest in contributing, the relevant code path starts at https://github.com/apache/jena/blob/main/jena-tdb2/src/main/java/org/apache/jena/tdb2/sys/DatabaseOps.java#L249 AFAIK

rvesse avatar Feb 28 '24 09:02 rvesse

Hi @seakayone - do you have access to a MS Windows machine? (preferably a real machine, not a VM or other virtualization)

Memory mapped files in Java can be troublesome (a long standing "wil not fix" JVM issue) so verifying whether Files.move works for a memory mapped file directly or if the file is within a directory that is renamed would be very welcome.

afs avatar Feb 28 '24 13:02 afs

Thanks @rvesse I will look into that. Unfortunately today I have other priorities. I will make some time tomorrow.

do you have access to a MS Windows machine?

@afs Not yet, but that can probably be arranged, I will check it out in our company.

Memory mapped files in Java can be troublesome (a long standing "wil not fix" JVM issue)

Is there a specific bug in https://bugs.java.com/ or https://github.com/apache/jena/issues you are referring to?

seakayone avatar Feb 29 '24 08:02 seakayone

Is there a specific bug in https://bugs.java.com/ or https://github.com/apache/jena/issues you are referring to?

Yes see https://bugs.openjdk.org/browse/JDK-4715154 which was mentioned earlier in the comments

rvesse avatar Feb 29 '24 09:02 rvesse

Java bug: https://bugs.openjdk.org/browse/JDK-4715154 and related/duplicates.

I don't know the exact underlying reason it's a "won't fix" but NTFS treats open files separately to naming files while Linux FS combine them in the reference count "#inodes+#openfiles" and when it goes to zero, the file is actually deleted.

Container/VMs get complicated because the exact semantics of files are a combination of host and guest if it's a file on the host accessed by the guest.

afs avatar Feb 29 '24 09:02 afs