qEndpoint
qEndpoint copied to clipboard
error in LongArrayDisk when trying to run qepSearch.sh on large HDT file
Part of the endpoint? (leave empty if you don't know)
- [ ] Backend (qendpoint-backend)
- [ ] Store (qendpoint-backend)
- [ ] Core (qendpoint-core)
- [ ] Frontend (qendpoint-frontend)
- [X] Other
Description of the issue
I'm trying to create an index for a huge HDT file (29,773,033,292 triples). I'm doing this by trying to start qepSearch.sh.
Excepted behavior
I expect a file mytriples.hdt.index.v1-1 to be generated, and then be able to search for triples.
Obtained behavior
After about 20 minutes, I get this output:
10:16:06,369 |-INFO in ch.qos.logback.classic.LoggerContext[default] - This is logback-classic version 1.4.5
10:16:06,441 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Could NOT find resource [logback-test.xml]
10:16:06,446 |-INFO in ch.qos.logback.classic.LoggerContext[default] - Found resource [logback.xml] at [jar:file:/home/balhoff/qendpoint-cli-1.16.1/lib/qendpoint-1.16.1.jar!/logback.xml]
10:16:06,448 |-WARN in ch.qos.logback.classic.util.DefaultJoranConfigurator@45b9a632 - Resource [logback.xml] occurs multiple times on the classpath.
10:16:06,448 |-WARN in ch.qos.logback.classic.util.DefaultJoranConfigurator@45b9a632 - Resource [logback.xml] occurs at [jar:file:/home/balhoff/qendpoint-cli-1.16.1/lib/qendpoint-1.16.1.jar!/logback.xml]
10:16:06,448 |-WARN in ch.qos.logback.classic.util.DefaultJoranConfigurator@45b9a632 - Resource [logback.xml] occurs at [jar:file:/home/balhoff/qendpoint-cli-1.16.1/lib/qendpoint-backend-1.16.1.jar!/logback.xml]
10:16:06,455 |-INFO in ch.qos.logback.core.joran.spi.ConfigurationWatchList@25d250c6 - URL [jar:file:/home/balhoff/qendpoint-cli-1.16.1/lib/qendpoint-1.16.1.jar!/logback.xml] is not of type file
10:16:06,610 |-INFO in ch.qos.logback.core.model.processor.AppenderModelHandler - Processing appender named [STDOUT]
10:16:06,611 |-INFO in ch.qos.logback.core.model.processor.AppenderModelHandler - About to instantiate appender of type [ch.qos.logback.core.ConsoleAppender]
10:16:06,620 |-INFO in ch.qos.logback.core.model.processor.ImplicitModelHandler - Assuming default type [ch.qos.logback.classic.encoder.PatternLayoutEncoder] for [encoder] property
10:16:06,636 |-INFO in ch.qos.logback.classic.model.processor.RootLoggerModelHandler - Setting level of ROOT logger to INFO
10:16:06,636 |-INFO in ch.qos.logback.core.model.processor.AppenderRefModelHandler - Attaching appender named [STDOUT] to Logger[ROOT]
10:16:06,637 |-INFO in ch.qos.logback.core.model.processor.DefaultProcessor@79e2c065 - End of configuration.
10:16:06,639 |-INFO in ch.qos.logback.classic.joran.JoranConfigurator@36bc55de - Registering current configuration as safe fallback point
[main][ ] 0.00 reading buffer
10:32:41.515 [main] INFO c.t.q.c.triples.impl.BitmapTriples - Count Objects in 15 min 54 sec 607 ms 784 us Max was: 2137208329
10:33:28.540 [main] INFO c.t.q.c.triples.impl.BitmapTriples - Bitmap in 47 sec 16 ms 286 us
Exception in thread "main" java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.set0(LongArrayDisk.java:236)
at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.clear(LongArrayDisk.java:289)
at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.<init>(LongArrayDisk.java:95)
at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.<init>(LongArrayDisk.java:62)
at com.the_qa_company.qendpoint.core.util.disk.LongArrayDisk.<init>(LongArrayDisk.java:58)
at com.the_qa_company.qendpoint.core.compact.sequence.SequenceLog64BigDisk.<init>(SequenceLog64BigDisk.java:80)
at com.the_qa_company.qendpoint.core.compact.sequence.SequenceLog64BigDisk.<init>(SequenceLog64BigDisk.java:72)
at com.the_qa_company.qendpoint.core.triples.impl.BitmapTriples$1.<init>(BitmapTriples.java:514)
at com.the_qa_company.qendpoint.core.triples.impl.BitmapTriples.createSequence64(BitmapTriples.java:514)
at com.the_qa_company.qendpoint.core.triples.impl.BitmapTriples.createIndexObjectMemoryEfficient(BitmapTriples.java:773)
at com.the_qa_company.qendpoint.core.triples.impl.BitmapTriples.generateIndex(BitmapTriples.java:1005)
at com.the_qa_company.qendpoint.core.hdt.impl.HDTImpl.loadOrCreateIndex(HDTImpl.java:526)
at com.the_qa_company.qendpoint.core.hdt.HDTManagerImpl.doMapIndexedHDT(HDTManagerImpl.java:99)
at com.the_qa_company.qendpoint.core.hdt.HDTManager.mapIndexedHDT(HDTManager.java:448)
at com.the_qa_company.qendpoint.tools.QEPSearch.executeHDT(QEPSearch.java:361)
at com.the_qa_company.qendpoint.tools.QEPSearch.execute(QEPSearch.java:934)
at com.the_qa_company.qendpoint.tools.QEPSearch.main(QEPSearch.java:1322)
How to reproduce
Using JDK 17.0.2, export JAVA_OPTIONS="-Xmx500G -XX:+UseParallelGC". Then:
qepSearch.sh mytriples.hdt
The file mytriples.hdt is 344 GB. I can provide somehow if it is helpful.
Endpoint version
1.16.1
Do I want to contribute to fix it?
Maybe
Something else?
No response
Most of the memory implementations are old and not really reliable for large datasets (at least 1B triples). I suggest you to only use disk implementation for this kind of workload.
To enable the disk indexing you can use these configs:
# use disk implementation
bitmaptriples.indexmethod=disk
# directory to compute the index
bitmaptriples.sequence.disk.location=disk-work-dir
# use disk locations and indexes
bitmaptriples.sequence.disk=true
bitmaptriples.sequence.disk.subindex=true
It can be done in with the -config or -options params
@ate47 thank you! Your suggestion worked perfectly.