jena
jena copied to clipboard
Server exit during TDB2 compaction leaves partial storage directory
Version
4.x, 5.x
What happened?
?deleteOld=true is possibly unrelated.
Server exit during the building phase of the new, compacted database (work done by CopyDSG.copy called from DatabaseOps.compact) needs to have some means that a restarted server can clear up.
There does not seem to be any persistent state error handling/recovery.
It is unclear if running on MS Windows (the JDK issue with memory mapped files) is a factor.
Relevant output and stacktrace
No response
Are you interested in making a pull request?
Yes
Hi, thanks for taking a look at this!
We're running Fuseki in a Docker container (based on eclipse-temurin:17-jre-focal which in turn is based on Ubuntu). The host is running Ubuntu 22.04.3 LTS.
Here are the relevant logs:
12:55:12 INFO Server :: Apache Jena Fuseki 4.10.0
12:55:12 INFO Config :: FUSEKI_HOME=/jena-fuseki
12:55:12 INFO Config :: FUSEKI_BASE=/fuseki
12:55:12 INFO Config :: Shiro file: file:///fuseki/shiro.ini
12:55:13 INFO Config :: Load configuration: file:///fuseki/configuration/dsp-repo.ttl
12:55:13 INFO Server :: Configuration file: /fuseki/config.ttl
12:55:13 INFO Server :: Path = /dsp-repo
12:55:13 INFO Server :: Memory: 1.0 GiB
12:55:13 INFO Server :: Java: 17.0.10
12:55:13 INFO Server :: OS: Linux 5.15.0-91-generic amd64
12:55:13 INFO Server :: PID: 7
12:55:13 INFO Server :: Started 2024/02/08 12:55:13 UTC on port 3030
<some queries>
12:57:35 INFO Admin :: [47] Compact dataset /dsp-repo
12:57:35 INFO Server :: Task : 1 : Compact
12:57:35 INFO Server :: [Task 1] starts : Compact
12:57:35 INFO Compact :: [47] >>>> Start compact /dsp-repo
12:57:35 INFO Admin :: [48] Task 1
12:57:45 INFO Admin :: [49] Task 1
12:57:55 INFO Admin :: [51] Task 1
12:58:05 INFO Admin :: [54] Task 1
12:58:15 INFO Admin :: [55] Task 1
12:58:25 INFO Admin :: [57] Task 1
<stopped fuseki>
12:58:46 INFO Server :: Apache Jena Fuseki 4.10.0
12:58:46 INFO Config :: FUSEKI_HOME=/jena-fuseki
12:58:46 INFO Config :: FUSEKI_BASE=/fuseki
12:58:46 INFO Config :: Shiro file: file:///fuseki/shiro.ini
12:58:46 INFO Config :: Load configuration: file:///fuseki/configuration/dsp-repo.ttl
12:58:47 INFO Server :: Configuration file: /fuseki/config.ttl
12:58:47 INFO Server :: Path = /dsp-repo
12:58:47 INFO Server :: Memory: 1.0 GiB
12:58:47 INFO Server :: Java: 17.0.10
12:58:47 INFO Server :: OS: Linux 5.15.0-91-generic amd64
12:58:47 INFO Server :: PID: 7
12:58:47 INFO Server :: Started 2024/02/08 12:58:47 UTC on port 3030
There are no errors in the logs and Fuseki starts up just fine after it was stopped during compaction, it's just missing most of the original data.
And these are the files in Data-0001 (original) and Data-0002 (partially compacted):
root@dd2493b7258a:/fuseki/databases/dsp-repo# ls -la Data-0001
total 8408748
drwxr-xr-x 2 root root 4096 Feb 8 04:01 .
drwxr-xr-x 4 root root 4096 Feb 8 13:57 ..
-rw-r--r-- 1 root root 24 Feb 8 04:10 GOSP.bpt
-rw-r--r-- 1 root root 1317011456 Feb 8 04:10 GOSP.dat
-rw-r--r-- 1 root root 16777216 Feb 8 04:10 GOSP.idn
-rw-r--r-- 1 root root 24 Feb 8 04:10 GPOS.bpt
-rw-r--r-- 1 root root 1342177280 Feb 8 04:10 GPOS.dat
-rw-r--r-- 1 root root 16777216 Feb 8 04:10 GPOS.idn
-rw-r--r-- 1 root root 24 Feb 8 04:10 GPU.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 GPU.dat
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 GPU.idn
-rw-r--r-- 1 root root 24 Feb 8 04:10 GSPO.bpt
-rw-r--r-- 1 root root 1375731712 Feb 8 04:10 GSPO.dat
-rw-r--r-- 1 root root 16777216 Feb 8 04:10 GSPO.idn
-rw-r--r-- 1 root root 0 Feb 8 04:10 journal.jrnl
-rw-r--r-- 1 root root 24 Feb 8 04:10 nodes.bpt
-rw-r--r-- 1 root root 176160768 Feb 8 04:10 nodes.dat
-rw-r--r-- 1 root root 16 Feb 8 04:10 nodes-data.bdf
-rw-r--r-- 1 root root 308827588 Feb 8 04:10 nodes-data.obj
-rw-r--r-- 1 root root 8388608 Feb 8 04:10 nodes.idn
-rw-r--r-- 1 root root 24 Feb 8 04:10 OSP.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 OSP.dat
-rw-r--r-- 1 root root 24 Feb 8 04:10 OSPG.bpt
-rw-r--r-- 1 root root 1317011456 Feb 8 04:10 OSPG.dat
-rw-r--r-- 1 root root 16777216 Feb 8 04:10 OSPG.idn
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 OSP.idn
-rw-r--r-- 1 root root 24 Feb 8 04:10 POS.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 POS.dat
-rw-r--r-- 1 root root 24 Feb 8 04:10 POSG.bpt
-rw-r--r-- 1 root root 1350565888 Feb 8 04:10 POSG.dat
-rw-r--r-- 1 root root 16777216 Feb 8 04:10 POSG.idn
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 POS.idn
-rw-r--r-- 1 root root 24 Feb 8 04:10 prefixes.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 prefixes.dat
-rw-r--r-- 1 root root 16 Feb 8 04:10 prefixes-data.bdf
-rw-r--r-- 1 root root 0 Feb 8 04:01 prefixes-data.obj
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 prefixes.idn
-rw-r--r-- 1 root root 24 Feb 8 04:10 SPO.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 SPO.dat
-rw-r--r-- 1 root root 24 Feb 8 04:10 SPOG.bpt
-rw-r--r-- 1 root root 1375731712 Feb 8 04:10 SPOG.dat
-rw-r--r-- 1 root root 16777216 Feb 8 04:10 SPOG.idn
-rw-r--r-- 1 root root 8388608 Feb 8 04:01 SPO.idn
-rw-r--r-- 1 root root 2 Feb 8 13:55 tdb.lock
root@dd2493b7258a:/fuseki/databases/dsp-repo# ls -la Data-0002
total 982360
drwxr-xr-x 2 root root 4096 Feb 8 13:57 .
drwxr-xr-x 4 root root 4096 Feb 8 13:57 ..
-rw-r--r-- 1 root root 24 Feb 8 13:59 GOSP.bpt
-rw-r--r-- 1 root root 150994944 Feb 8 13:59 GOSP.dat
-rw-r--r-- 1 root root 8388608 Feb 8 13:59 GOSP.idn
-rw-r--r-- 1 root root 24 Feb 8 13:59 GPOS.bpt
-rw-r--r-- 1 root root 159383552 Feb 8 13:59 GPOS.dat
-rw-r--r-- 1 root root 8388608 Feb 8 13:59 GPOS.idn
-rw-r--r-- 1 root root 24 Feb 8 13:59 GPU.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 GPU.dat
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 GPU.idn
-rw-r--r-- 1 root root 24 Feb 8 13:59 GSPO.bpt
-rw-r--r-- 1 root root 167772160 Feb 8 13:59 GSPO.dat
-rw-r--r-- 1 root root 8388608 Feb 8 13:59 GSPO.idn
-rw-r--r-- 1 root root 0 Feb 8 13:59 journal.jrnl
-rw-r--r-- 1 root root 24 Feb 8 13:59 nodes.bpt
-rw-r--r-- 1 root root 25165824 Feb 8 13:59 nodes.dat
-rw-r--r-- 1 root root 16 Feb 8 13:59 nodes-data.bdf
-rw-r--r-- 1 root root 44722477 Feb 8 13:59 nodes-data.obj
-rw-r--r-- 1 root root 8388608 Feb 8 13:59 nodes.idn
-rw-r--r-- 1 root root 24 Feb 8 13:59 OSP.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 OSP.dat
-rw-r--r-- 1 root root 24 Feb 8 13:59 OSPG.bpt
-rw-r--r-- 1 root root 150994944 Feb 8 13:59 OSPG.dat
-rw-r--r-- 1 root root 8388608 Feb 8 13:59 OSPG.idn
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 OSP.idn
-rw-r--r-- 1 root root 24 Feb 8 13:59 POS.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 POS.dat
-rw-r--r-- 1 root root 24 Feb 8 13:59 POSG.bpt
-rw-r--r-- 1 root root 159383552 Feb 8 13:59 POSG.dat
-rw-r--r-- 1 root root 8388608 Feb 8 13:59 POSG.idn
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 POS.idn
-rw-r--r-- 1 root root 24 Feb 8 13:59 prefixes.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 prefixes.dat
-rw-r--r-- 1 root root 16 Feb 8 13:59 prefixes-data.bdf
-rw-r--r-- 1 root root 0 Feb 8 13:57 prefixes-data.obj
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 prefixes.idn
-rw-r--r-- 1 root root 24 Feb 8 13:59 SPO.bpt
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 SPO.dat
-rw-r--r-- 1 root root 24 Feb 8 13:59 SPOG.bpt
-rw-r--r-- 1 root root 167772160 Feb 8 13:59 SPOG.dat
-rw-r--r-- 1 root root 8388608 Feb 8 13:59 SPOG.idn
-rw-r--r-- 1 root root 8388608 Feb 8 13:57 SPO.idn
-rw-r--r-- 1 root root 2 Feb 8 13:58 tdb.lock
root@dd2493b7258a:/fuseki/databases/dsp-repo#
I should note that the data in Data-0001 is already mostly compacted as it was restored from a previous dump, so I'd expect Data-0002 to end up having about the same size.
Config:
@prefix : <http://base/#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .
@prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix knora-base: <http://www.knora.org/ontology/knora-base#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix root: <http://127.0.0.1:8888/$/datasets#> .
@prefix tdb2: <http://jena.apache.org/2016/tdb#> .
@prefix text: <http://jena.apache.org/text#> .
:indexLucene rdf:type text:TextIndexLucene;
text:analyzer [ rdf:type text:ConfigurableAnalyzer;
text:filters ( text:ASCIIFoldingFilter text:LowerCaseFilter );
text:tokenizer text:WhitespaceTokenizer
];
text:directory "/fuseki/lucene/dsp-repo";
text:entityMap :entMap .
[ rdf:type fuseki:Server;
fuseki:services :service_tdb_all;
ja:loadClass "org.apache.jena.query.text.TextQuery"
] .
:tdb_dataset_readwrite
rdf:type tdb2:DatasetTDB2;
tdb2:location "/fuseki/databases/dsp-repo";
tdb2:unionDefaultGraph true .
:service_tdb_all rdf:type fuseki:Service;
rdfs:label "TDB2 dsp-repo";
fuseki:dataset :text_dataset;
fuseki:name "dsp-repo";
fuseki:serviceQuery "sparql" , "query";
fuseki:serviceReadGraphStore "get";
fuseki:serviceReadWriteGraphStore
"data";
fuseki:serviceUpdate "update";
fuseki:serviceUpload "upload" .
:text_dataset rdf:type text:TextDataset;
text:dataset :tdb_dataset_readwrite;
text:index :indexLucene .
:entMap rdf:type text:EntityMap;
text:defaultField "text";
text:entityField "uri";
text:map ( [ text:field "text";
text:predicate rdfs:label
]
[ text:field "text";
text:predicate knora-base:valueHasString
]
[ text:field "text";
text:predicate knora-base:valueHasComment
]
);
text:uidField "uid" .
A simple idea for improving this situation:
- in the process of compaction, write the dataset into a directory with a temporary name that won't be recognized during Fuseki's startup (e.g.
Data-0002-tmpinstead ofData-0002) - when compaction is finished, rename the directory
Assuming the original (Data-0002-tmp) and the target (Data-0002) directories are on the same file system the rename/move operation could* be atomic.
*This is platform-dependent, see for instance File.renameTo().
That's the common way to do it - the other is to write a file with the current active storage area at the start of compaction, and append the intended new location at the end of the work, move it, then then delete the file. A transaction log in other words.
Can a directory with memory-mapped files in it be renamed on MS Windows? Or does this run into the problems around https://bugs.openjdk.org/browse/JDK-4715154?
The related Files.move
has more details and seems to say "OK" """ When invoked to move a directory that is not empty then the directory is moved if it does not require moving the entries in the directory. """
Hi again, do you perhaps have a rough timeframe when or if this issue might be addressed? Thanks!
Hi @SamuelBoerlin - This issue is something that needs fixing.
Contributors to the Jena project are volunteers, often working around other commitments. The project itself does not commit to timescales because it has no resources under its control.
The use of the text index is not a factor.
Hi @afs,
I am a colleague of @SamuelBoerlin and have a decade of Java experience and I'd like to offer to contribute and fix this problem.
However, since I am unfamiliar with the code base it would take a substantial amount of time to understand the underlying problem and find a suitable solution. I would be grateful for any support or introduction.
Do you have an idea/hint where I could start looking into?
@seakayone Thanks for the interest in contributing, the relevant code path starts at https://github.com/apache/jena/blob/main/jena-tdb2/src/main/java/org/apache/jena/tdb2/sys/DatabaseOps.java#L249 AFAIK
Hi @seakayone - do you have access to a MS Windows machine? (preferably a real machine, not a VM or other virtualization)
Memory mapped files in Java can be troublesome (a long standing "wil not fix" JVM issue) so verifying whether Files.move works for a memory mapped file directly or if the file is within a directory that is renamed would be very welcome.
Thanks @rvesse I will look into that. Unfortunately today I have other priorities. I will make some time tomorrow.
do you have access to a MS Windows machine?
@afs Not yet, but that can probably be arranged, I will check it out in our company.
Memory mapped files in Java can be troublesome (a long standing "wil not fix" JVM issue)
Is there a specific bug in https://bugs.java.com/ or https://github.com/apache/jena/issues you are referring to?
Is there a specific bug in https://bugs.java.com/ or https://github.com/apache/jena/issues you are referring to?
Yes see https://bugs.openjdk.org/browse/JDK-4715154 which was mentioned earlier in the comments
Java bug: https://bugs.openjdk.org/browse/JDK-4715154 and related/duplicates.
I don't know the exact underlying reason it's a "won't fix" but NTFS treats open files separately to naming files while Linux FS combine them in the reference count "#inodes+#openfiles" and when it goes to zero, the file is actually deleted.
Container/VMs get complicated because the exact semantics of files are a combination of host and guest if it's a file on the host accessed by the guest.