janusgraph icon indicating copy to clipboard operation
janusgraph copied to clipboard

JanusGraph image stop responding after query timeout

Open doryosef opened this issue 4 years ago • 10 comments

I'm using the default docker image janusgraph/janusgraph:latest (Berkeley and Lucene) and connecting with gremlin console.

When JanusGraph server exceeded his 'evaluationTimeout' the server stop responding

server error:

java.util.concurrent.TimeoutException: Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g.V()]
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$1(GremlinExecutor.java:316)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(Thread.java:748)
1318978 [pool-6-thread-1] WARN  org.janusgraph.diskstorage.log.kcvs.KCVSLog  - Could not read messages for timestamp [2020-05-24T10:12:30.449Z] (this read will be retried)
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:56)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:158)
        at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:725)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Could not start BerkeleyJE transaction
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:163)
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:47)
        at org.janusgraph.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreManagerAdapter.beginTransaction(OrderedKeyValueStoreManagerAdapter.java:68)
        at org.janusgraph.diskstorage.log.kcvs.KCVSLog.openTx(KCVSLog.java:319)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:145)
        at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:161)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        ... 9 more
Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 18.3.12) /var/lib/janusgraph/data java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.ThreadInterruptedException.wrapSelf(ThreadInterruptedException.java:105)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1844)
        at com.sleepycat.je.Environment.checkOpen(Environment.java:2697)
        at com.sleepycat.je.Environment.beginTransactionInternal(Environment.java:1409)
        at com.sleepycat.je.Environment.beginTransaction(Environment.java:1383)
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:146)
        ... 16 more
Caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 18.3.12) /var/lib/janusgraph/data java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:67)
        at com.sleepycat.je.tree.IN.latch(IN.java:547)
        at com.sleepycat.je.dbi.CursorImpl.latchBIN(CursorImpl.java:402)
        at com.sleepycat.je.dbi.CursorImpl.cloneCursor(CursorImpl.java:230)
        at com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:5252)
        at com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:5259)
        at com.sleepycat.je.Cursor.retrieveNextNoDups(Cursor.java:3550)
        at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:3312)
        at com.sleepycat.je.Cursor.getInternal(Cursor.java:1313)
        at com.sleepycat.je.Cursor.get(Cursor.java:1244)
        at com.sleepycat.je.Cursor.getNext(Cursor.java:1512)

after the query been sent to server and timeout exceeded other queries which worked before gets same response

Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g.V().limit(4).valueMap()]: null - try increasing the timeout with the :remote command

doryosef avatar May 24 '20 12:05 doryosef

I've found the same behaviour in 0.5.3 submitting scripts both from the console and from a connection. Once the server launches a timeout, it stops answering and tells you that it's always a timeout.

I can confirm that it happens with Berkeley + ES, versions 0.5.2 and 0.5.3 (when using Cassandra + ES in those versions, this doesn't happen).

cbobed avatar Jun 01 '21 07:06 cbobed

I've found the same behaviour in 0.5.3 submitting scripts both from the console and from a connection. Once the server launches a timeout, it stops answering and tells you that it's always a timeout.

I can confirm that it happens with Berkeley + ES, versions 0.5.2 and 0.5.3 (when using Cassandra + ES in those versions, this doesn't happen).

We've encountered the same issue using the full release version 0.5.3 with the Cassandra + ES backend, connecting through a JavaScript Driver, a Python Driver, and a Gremlin.sh Groovy console.

Omig12 avatar Jun 03 '21 00:06 Omig12

I faced same issue on 0.6 (latest) + Cassandra + ES.

Have no idea why, any update on how to overcome it? I had to remove all my datas then re-run the engine to get it working, does it mean that the data is corrupted?

mohamad-haddad-tribo avatar Dec 15 '21 16:12 mohamad-haddad-tribo

@mohamad-haddad-tribo The exception from above clear comes from berkeley. I'm sure you did get a berkeley exception in cassandra setup.

farodin91 avatar Dec 16 '21 07:12 farodin91

I'm also running into the same issue with Cassandra during data ingestion using concurrent inserts

javiramos1 avatar Feb 22 '22 14:02 javiramos1

I am using JanusGraph 0.6.0 and I confirm this is still an issue with BerkeleyDB. Once this error occurs, the server won't be able to recover from it. (P.S.: I know 0.6.1 has been released, but I was encountering issues with it, so I stick with 0.6.0).

jldevezas avatar May 09 '22 09:05 jldevezas

Same for us on the in-memory backend :(

delenius avatar Nov 01 '22 17:11 delenius

The same issue with the Cassandra backend in 1.0.0

m-thirumal avatar Nov 16 '23 03:11 m-thirumal