orientdb
orientdb copied to clipboard
After system restart, OrientDB fails to start
OrientDB Version: 3.2.18
OS: docker image
Expected behavior
OrientDB is deployed to as container in K8s cluster. OrientDB runs during the normal operation. On a node/cluster restarted, its expects that orientdb starts and works as expected.
Actual behavior
At times OrientDB fails to start and ends-up in CrashloopBackOff, with the folllowing error
Exception <ID> in storage plocal:/orientdb/databases/OSystem: 3.2.18 (build 75890139e2e64b786a59c95b913af9fbb86c5cfc, branch UNKNOWN) [OLocalPaginatedStorage]`
Complete StackTrace:
INFO System is started under an effective user : `999` [OEngineLocalPaginated]
INFO WAL maximum segment size is set to 2,511 MB [OrientDBDistributed]
INFO Databases directory: /orientdb/databases [OServer]
INFO Page size for WAL located in /orientdb/databases/OSystem is set to 4096 bytes. [CASDiskWriteAheadLog]
INFO DWL:OSystem: block size = 4096 bytes, maximum segment size = 506 MB [DoubleWriteLogGL]
SEVER Exception `<ID>` in storage `plocal:/orientdb/databases/OSystem`: 3.2.18 (build 75890139e2e64b786a59c95b913af9fbb86c5cfc, branch UNKNOWN) [OLocalPaginatedStorage]
com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of atomic operation inside of storage OSystem
at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.executeInsideAtomicOperation(OAtomicOperationsManager.java:146)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:531)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.getAndOpenStorage(OrientDBEmbedded.java:590)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:517)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:87)
at com.orientechnologies.orient.core.db.OSystemDatabase.openSystemDatabase(OSystemDatabase.java:86)
at com.orientechnologies.orient.core.db.OSystemDatabase.checkServerId(OSystemDatabase.java:165)
at com.orientechnologies.orient.core.db.OSystemDatabase.init(OSystemDatabase.java:153)
at com.orientechnologies.orient.server.OServer.initSystemDatabase(OServer.java:1147)
at com.orientechnologies.orient.server.OServer.activate(OServer.java:430)
at com.orientechnologies.orient.server.OServerMain$1.run(OServerMain.java:49)
Caused by: java.lang.NullPointerException
at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeBucketSingleValueV1.<init>(CellBTreeBucketSingleValueV1.java:58)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeSingleValueV1.findBucket(CellBTreeSingleValueV1.java:1341)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeSingleValueV1.get(CellBTreeSingleValueV1.java:189)
at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.readProperty(OClusterBasedStorageConfiguration.java:1819)
at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.readConfiguration(OClusterBasedStorageConfiguration.java:922)
at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.load(OClusterBasedStorageConfiguration.java:253)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.lambda$open$1(OAbstractPaginatedStorage.java:537)
at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.executeInsideAtomicOperation(OAtomicOperationsManager.java:140)
... 10 more
I'm also occasionally encountering a similar NullPointerException in ODurablePage (using version 3.2.32). Here are a few example stacktraces:
SEVERE [10:45:17 10-Oct-24 EDT][com.orientechnologies.orient.core.storage.disk.OLocalPaginatedStorage] Exception `56B353D7` in storage `plocal:XXX`: 3.2.32 (build ${buildNumber}, branch UNKNOWN)
com.orientechnologies.orient.core.exception.OStorageException: Internal error happened in storage XXX please restart the server or re-open the storage to undergo the restore process and fix the error. DB name="XXX"
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkErrorState(OAbstractPaginatedStorage.java:4587)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkOpennessAndMigration(OAbstractPaginatedStorage.java:4567)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getClusterIdByName(OAbstractPaginatedStorage.java:2186)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.getClusterIdByName(ODatabaseDocumentAbstract.java:619)
at com.orientechnologies.orient.core.metadata.OMetadataDefault.init(OMetadataDefault.java:122)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.loadMetadata(ODatabaseDocumentEmbedded.java:348)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.init(ODatabaseDocumentEmbedded.java:205)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.newSessionInstance(OrientDBEmbedded.java:439)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:459)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$executeNoAuthorization$8(OrientDBEmbedded.java:1150)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of component operation inside of storage XXX DB name="XXX"
at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:226)
at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:213)
at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.calculateInsideComponentOperation(ODurableComponent.java:96)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.update(CellBTreeSingleValueV3.java:226)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.put(CellBTreeSingleValueV3.java:210)
at com.orientechnologies.orient.core.index.engine.v1.OCellBTreeMultiValueIndexEngine.put(OCellBTreeMultiValueIndexEngine.java:417)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntryInternal(OAbstractPaginatedStorage.java:3270)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntry(OAbstractPaginatedStorage.java:3250)
at com.orientechnologies.orient.core.index.OIndexMultiValues.doPutV1(OIndexMultiValues.java:207)
at com.orientechnologies.orient.core.index.OIndexMultiValues.doPut(OIndexMultiValues.java:177)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.applyTxChanges(OAbstractPaginatedStorage.java:2584)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:2569)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2493)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2309)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.internalCommit(ODatabaseDocumentEmbedded.java:1953)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:651)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:116)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1592)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1562)
... 16 more
Caused by: java.lang.NullPointerException
at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueBucketV3.<init>(CellBTreeSingleValueBucketV3.java:58)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.allocateNewPage(CellBTreeSingleValueV3.java:1558)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitNonRootBucket(CellBTreeSingleValueV3.java:1453)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitBucket(CellBTreeSingleValueV3.java:1416)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.lambda$update$1(CellBTreeSingleValueV3.java:323)
at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:221)
... 34 more
Another:
Error on formatting message 'Exception `%08X` in storage `%s`: %s'. Exception: java.lang.IllegalArgumentException: can't parse argument number: buildNumberSEVERE [10:45:17 10-Oct-24 EDT][com.orientechnologies.common.thread.ScalingThreadPoolExecutor] Exception in thread 'OrientDBEmbedded-1'
com.orientechnologies.orient.core.exception.ODatabaseException: Cannot open database 'XXX'
at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:465)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$executeNoAuthorization$8(OrientDBEmbedded.java:1150)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Internal error happened in storage XXX please restart the server or re-open the storage to undergo the restore process and fix the error. DB name="XXX"
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkErrorState(OAbstractPaginatedStorage.java:4587)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkOpennessAndMigration(OAbstractPaginatedStorage.java:4567)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getClusterIdByName(OAbstractPaginatedStorage.java:2186)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.getClusterIdByName(ODatabaseDocumentAbstract.java:619)
at com.orientechnologies.orient.core.metadata.OMetadataDefault.init(OMetadataDefault.java:122)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.loadMetadata(ODatabaseDocumentEmbedded.java:348)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.init(ODatabaseDocumentEmbedded.java:205)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.newSessionInstance(OrientDBEmbedded.java:439)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:459)
... 5 more
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of component operation inside of storage XXX DB name="XXX"
at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:226)
at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:213)
at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.calculateInsideComponentOperation(ODurableComponent.java:96)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.update(CellBTreeSingleValueV3.java:226)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.put(CellBTreeSingleValueV3.java:210)
at com.orientechnologies.orient.core.index.engine.v1.OCellBTreeMultiValueIndexEngine.put(OCellBTreeMultiValueIndexEngine.java:417)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntryInternal(OAbstractPaginatedStorage.java:3270)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntry(OAbstractPaginatedStorage.java:3250)
at com.orientechnologies.orient.core.index.OIndexMultiValues.doPutV1(OIndexMultiValues.java:207)
at com.orientechnologies.orient.core.index.OIndexMultiValues.doPut(OIndexMultiValues.java:177)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.applyTxChanges(OAbstractPaginatedStorage.java:2584)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:2569)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2493)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2309)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.internalCommit(ODatabaseDocumentEmbedded.java:1953)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:651)
at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:116)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1592)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1562)
... 16 more
Caused by: java.lang.NullPointerException
at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueBucketV3.<init>(CellBTreeSingleValueBucketV3.java:58)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.allocateNewPage(CellBTreeSingleValueV3.java:1558)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitNonRootBucket(CellBTreeSingleValueV3.java:1453)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitBucket(CellBTreeSingleValueV3.java:1416)
at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.lambda$update$1(CellBTreeSingleValueV3.java:323)
at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:221)
... 34 more
Hi,
This looks some issues in the storage logic, I do remember that some fixes in the storage logic have been done in some early releases of 3.2.x so I do suggest to update to the last hotfix, also it seems that the issues are mostly around the OSystem database, that if you are not using some advanced features like auditing can be removed safely and it will be recreated, let me know if you still have problem with the newer version of the OrientDB
@tglman if the observation is just with System DB(OSystem), the deleting it and restarting orientdb service recreates the system DB. However, in my case this issue is observed with not just with system DB (OSystem), but also with application DB as well..
Thank you for the comments! A few more details/questions:
- We originally saw this NullPointerException when using version 3.2.23. After upgrading to 3.2.32 (only two away from the latest release), we are still seeing the exceptions. We briefly tested with version 3.2.30 in between, and we did not observe the exception in that version, but it's possible that we didn't give it enough runtime to know for sure.
- We have tried deleting everything (more than just OSystem) and starting fresh by re-populating the database, and things will work fine for a few days. But eventually the problem occurs again after a few days of runtime for no known reason. The operations performed against the database are consistent in our test scenario, so we haven't determined a sequence of operations that leads to the exception state (it seems to happen randomly).
- If we're not using the advanced features of OSystem, is there any way to disable it to avoid this NullPointerException? Is OSystem necessary?
Thanks again!
Hi,
Are you using volumes for the data folder in your k8s deploy ?
Regards
Thanks, @tglman. I can't speak for the original reporter, but in my case, I'm not using K8s at all. So no volumes for the data folder. Just running OrientDb locally on a Windows PC with only local access.
@tglman Yes, in my case, I am using k8s volumes.
Hi,
Are you using volumes for the data folder in your k8s deploy ?
Regards
@tglman using k8s volumes has any observations with orientdb, is it?
Hi @suneelkumarch,
We do suggest to use volumes, because containers filesystems are not designed for databases, so if you are using volumes you are doing the right way.
I was double checking this, if you are using volumes this errors should come from some other places, did you have this issues after an upgrade or you get this errors also when using the database with the exact same version of OrientDB of which it was created ?
Hi @tglman, we get there errors on the same version of the orientdb, NO upgrades were performed. This observation is seen after the system was restarted(not a graceful shutdown), where orientdb is running as Pod.