janusgraph JanusGraph image stop responding after query timeout

I'm using the default docker image janusgraph/janusgraph:latest (Berkeley and Lucene) and connecting with gremlin console.

When JanusGraph server exceeded his 'evaluationTimeout' the server stop responding

server error:

java.util.concurrent.TimeoutException: Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g.V()]
        at org.apache.tinkerpop.gremlin.groovy.engine.GremlinExecutor.lambda$eval$1(GremlinExecutor.java:316)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at java.lang.Thread.run(Thread.java:748)
1318978 [pool-6-thread-1] WARN  org.janusgraph.diskstorage.log.kcvs.KCVSLog  - Could not read messages for timestamp [2020-05-24T10:12:30.449Z] (this read will be retried)
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:56)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:158)
        at org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.run(KCVSLog.java:725)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.janusgraph.diskstorage.PermanentBackendException: Could not start BerkeleyJE transaction
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:163)
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:47)
        at org.janusgraph.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreManagerAdapter.beginTransaction(OrderedKeyValueStoreManagerAdapter.java:68)
        at org.janusgraph.diskstorage.log.kcvs.KCVSLog.openTx(KCVSLog.java:319)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:145)
        at org.janusgraph.diskstorage.util.BackendOperation$1.call(BackendOperation.java:161)
        at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)
        at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:54)
        ... 9 more
Caused by: com.sleepycat.je.ThreadInterruptedException: (JE 18.3.12) Environment must be closed, caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 18.3.12) /var/lib/janusgraph/data java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.ThreadInterruptedException.wrapSelf(ThreadInterruptedException.java:105)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1835)
        at com.sleepycat.je.dbi.EnvironmentImpl.checkOpen(EnvironmentImpl.java:1844)
        at com.sleepycat.je.Environment.checkOpen(Environment.java:2697)
        at com.sleepycat.je.Environment.beginTransactionInternal(Environment.java:1409)
        at com.sleepycat.je.Environment.beginTransaction(Environment.java:1383)
        at org.janusgraph.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:146)
        ... 16 more
Caused by: com.sleepycat.je.ThreadInterruptedException: Environment invalid because of previous exception: (JE 18.3.12) /var/lib/janusgraph/data java.lang.InterruptedException THREAD_INTERRUPTED: InterruptedException may cause incorrect internal state, unable to continue. Environment is invalid and must be closed.
        at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:67)
        at com.sleepycat.je.tree.IN.latch(IN.java:547)
        at com.sleepycat.je.dbi.CursorImpl.latchBIN(CursorImpl.java:402)
        at com.sleepycat.je.dbi.CursorImpl.cloneCursor(CursorImpl.java:230)
        at com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:5252)
        at com.sleepycat.je.Cursor.beginMoveCursor(Cursor.java:5259)
        at com.sleepycat.je.Cursor.retrieveNextNoDups(Cursor.java:3550)
        at com.sleepycat.je.Cursor.retrieveNext(Cursor.java:3312)
        at com.sleepycat.je.Cursor.getInternal(Cursor.java:1313)
        at com.sleepycat.je.Cursor.get(Cursor.java:1244)
        at com.sleepycat.je.Cursor.getNext(Cursor.java:1512)

after the query been sent to server and timeout exceeded other queries which worked before gets same response

Evaluation exceeded the configured 'evaluationTimeout' threshold of 30000 ms or evaluation was otherwise cancelled directly for request [g.V().limit(4).valueMap()]: null - try increasing the timeout with the :remote command

May 24 '20 12:05 doryosef

I've found the same behaviour in 0.5.3 submitting scripts both from the console and from a connection. Once the server launches a timeout, it stops answering and tells you that it's always a timeout.

I can confirm that it happens with Berkeley + ES, versions 0.5.2 and 0.5.3 (when using Cassandra + ES in those versions, this doesn't happen).

Jun 01 '21 07:06 cbobed

I've found the same behaviour in 0.5.3 submitting scripts both from the console and from a connection. Once the server launches a timeout, it stops answering and tells you that it's always a timeout.

I can confirm that it happens with Berkeley + ES, versions 0.5.2 and 0.5.3 (when using Cassandra + ES in those versions, this doesn't happen).

We've encountered the same issue using the full release version 0.5.3 with the Cassandra + ES backend, connecting through a JavaScript Driver, a Python Driver, and a Gremlin.sh Groovy console.

Jun 03 '21 00:06 Omig12

I faced same issue on 0.6 (latest) + Cassandra + ES.

Have no idea why, any update on how to overcome it? I had to remove all my datas then re-run the engine to get it working, does it mean that the data is corrupted?

Dec 15 '21 16:12 mohamad-haddad-tribo

@mohamad-haddad-tribo The exception from above clear comes from berkeley. I'm sure you did get a berkeley exception in cassandra setup.

Dec 16 '21 07:12 farodin91

I'm also running into the same issue with Cassandra during data ingestion using concurrent inserts

Feb 22 '22 14:02 javiramos1

I am using JanusGraph 0.6.0 and I confirm this is still an issue with BerkeleyDB. Once this error occurs, the server won't be able to recover from it. (P.S.: I know 0.6.1 has been released, but I was encountering issues with it, so I stick with 0.6.0).

May 09 '22 09:05 jldevezas

Same for us on the in-memory backend :(

Nov 01 '22 17:11 delenius

The same issue with the Cassandra backend in 1.0.0

Nov 16 '23 03:11 thirumalx