orientdb icon indicating copy to clipboard operation
orientdb copied to clipboard

After system restart, OrientDB fails to start

Open suneelkumarch opened this issue 1 year ago • 10 comments

OrientDB Version: 3.2.18

OS: docker image

Expected behavior

OrientDB is deployed to as container in K8s cluster. OrientDB runs during the normal operation. On a node/cluster restarted, its expects that orientdb starts and works as expected.

Actual behavior

At times OrientDB fails to start and ends-up in CrashloopBackOff, with the folllowing error

Exception <ID> in storage plocal:/orientdb/databases/OSystem: 3.2.18 (build 75890139e2e64b786a59c95b913af9fbb86c5cfc, branch UNKNOWN) [OLocalPaginatedStorage]`

Complete StackTrace:

 INFO  System is started under an effective user : `999` [OEngineLocalPaginated]
 INFO  WAL maximum segment size is set to 2,511 MB [OrientDBDistributed]
 INFO  Databases directory: /orientdb/databases [OServer]
 INFO  Page size for WAL located in /orientdb/databases/OSystem is set to 4096 bytes. [CASDiskWriteAheadLog]
 INFO  DWL:OSystem: block size = 4096 bytes, maximum segment size = 506 MB [DoubleWriteLogGL]
 SEVER Exception `<ID>` in storage `plocal:/orientdb/databases/OSystem`: 3.2.18 (build 75890139e2e64b786a59c95b913af9fbb86c5cfc, branch UNKNOWN) [OLocalPaginatedStorage]
com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of atomic operation inside of storage OSystem
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.executeInsideAtomicOperation(OAtomicOperationsManager.java:146)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:531)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.getAndOpenStorage(OrientDBEmbedded.java:590)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:517)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:87)
	at com.orientechnologies.orient.core.db.OSystemDatabase.openSystemDatabase(OSystemDatabase.java:86)
	at com.orientechnologies.orient.core.db.OSystemDatabase.checkServerId(OSystemDatabase.java:165)
	at com.orientechnologies.orient.core.db.OSystemDatabase.init(OSystemDatabase.java:153)
	at com.orientechnologies.orient.server.OServer.initSystemDatabase(OServer.java:1147)
	at com.orientechnologies.orient.server.OServer.activate(OServer.java:430)
	at com.orientechnologies.orient.server.OServerMain$1.run(OServerMain.java:49)
Caused by: java.lang.NullPointerException
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeBucketSingleValueV1.<init>(CellBTreeBucketSingleValueV1.java:58)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeSingleValueV1.findBucket(CellBTreeSingleValueV1.java:1341)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v1.CellBTreeSingleValueV1.get(CellBTreeSingleValueV1.java:189)
	at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.readProperty(OClusterBasedStorageConfiguration.java:1819)
	at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.readConfiguration(OClusterBasedStorageConfiguration.java:922)
	at com.orientechnologies.orient.core.storage.config.OClusterBasedStorageConfiguration.load(OClusterBasedStorageConfiguration.java:253)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.lambda$open$1(OAbstractPaginatedStorage.java:537)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.executeInsideAtomicOperation(OAtomicOperationsManager.java:140)
	... 10 more

suneelkumarch avatar Jun 12 '24 10:06 suneelkumarch

I'm also occasionally encountering a similar NullPointerException in ODurablePage (using version 3.2.32). Here are a few example stacktraces:

SEVERE [10:45:17 10-Oct-24 EDT][com.orientechnologies.orient.core.storage.disk.OLocalPaginatedStorage] Exception `56B353D7` in storage `plocal:XXX`: 3.2.32 (build ${buildNumber}, branch UNKNOWN)
com.orientechnologies.orient.core.exception.OStorageException: Internal error happened in storage XXX please restart the server or re-open the storage to undergo the restore process and fix the error.	DB name="XXX"
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkErrorState(OAbstractPaginatedStorage.java:4587)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkOpennessAndMigration(OAbstractPaginatedStorage.java:4567)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getClusterIdByName(OAbstractPaginatedStorage.java:2186)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.getClusterIdByName(ODatabaseDocumentAbstract.java:619)
	at com.orientechnologies.orient.core.metadata.OMetadataDefault.init(OMetadataDefault.java:122)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.loadMetadata(ODatabaseDocumentEmbedded.java:348)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.init(ODatabaseDocumentEmbedded.java:205)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.newSessionInstance(OrientDBEmbedded.java:439)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:459)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$executeNoAuthorization$8(OrientDBEmbedded.java:1150)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of component operation inside of storage XXX	DB name="XXX"
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:226)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:213)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.calculateInsideComponentOperation(ODurableComponent.java:96)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.update(CellBTreeSingleValueV3.java:226)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.put(CellBTreeSingleValueV3.java:210)
	at com.orientechnologies.orient.core.index.engine.v1.OCellBTreeMultiValueIndexEngine.put(OCellBTreeMultiValueIndexEngine.java:417)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntryInternal(OAbstractPaginatedStorage.java:3270)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntry(OAbstractPaginatedStorage.java:3250)
	at com.orientechnologies.orient.core.index.OIndexMultiValues.doPutV1(OIndexMultiValues.java:207)
	at com.orientechnologies.orient.core.index.OIndexMultiValues.doPut(OIndexMultiValues.java:177)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.applyTxChanges(OAbstractPaginatedStorage.java:2584)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:2569)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2493)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2309)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.internalCommit(ODatabaseDocumentEmbedded.java:1953)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:651)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:116)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1592)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1562)
	... 16 more
Caused by: java.lang.NullPointerException
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueBucketV3.<init>(CellBTreeSingleValueBucketV3.java:58)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.allocateNewPage(CellBTreeSingleValueV3.java:1558)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitNonRootBucket(CellBTreeSingleValueV3.java:1453)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitBucket(CellBTreeSingleValueV3.java:1416)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.lambda$update$1(CellBTreeSingleValueV3.java:323)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:221)
	... 34 more

Another:

Error on formatting message 'Exception `%08X` in storage `%s`: %s'. Exception: java.lang.IllegalArgumentException: can't parse argument number: buildNumberSEVERE [10:45:17 10-Oct-24 EDT][com.orientechnologies.common.thread.ScalingThreadPoolExecutor] Exception in thread 'OrientDBEmbedded-1'
com.orientechnologies.orient.core.exception.ODatabaseException: Cannot open database 'XXX'
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:465)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.lambda$executeNoAuthorization$8(OrientDBEmbedded.java:1150)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Internal error happened in storage XXX please restart the server or re-open the storage to undergo the restore process and fix the error.	DB name="XXX"
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkErrorState(OAbstractPaginatedStorage.java:4587)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.checkOpennessAndMigration(OAbstractPaginatedStorage.java:4567)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.getClusterIdByName(OAbstractPaginatedStorage.java:2186)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.getClusterIdByName(ODatabaseDocumentAbstract.java:619)
	at com.orientechnologies.orient.core.metadata.OMetadataDefault.init(OMetadataDefault.java:122)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.loadMetadata(ODatabaseDocumentEmbedded.java:348)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.init(ODatabaseDocumentEmbedded.java:205)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.newSessionInstance(OrientDBEmbedded.java:439)
	at com.orientechnologies.orient.core.db.OrientDBEmbedded.openNoAuthorization(OrientDBEmbedded.java:459)
	... 5 more
Caused by: com.orientechnologies.orient.core.exception.OStorageException: Exception during execution of component operation inside of storage XXX	DB name="XXX"
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:226)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:213)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurableComponent.calculateInsideComponentOperation(ODurableComponent.java:96)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.update(CellBTreeSingleValueV3.java:226)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.put(CellBTreeSingleValueV3.java:210)
	at com.orientechnologies.orient.core.index.engine.v1.OCellBTreeMultiValueIndexEngine.put(OCellBTreeMultiValueIndexEngine.java:417)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntryInternal(OAbstractPaginatedStorage.java:3270)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.putRidIndexEntry(OAbstractPaginatedStorage.java:3250)
	at com.orientechnologies.orient.core.index.OIndexMultiValues.doPutV1(OIndexMultiValues.java:207)
	at com.orientechnologies.orient.core.index.OIndexMultiValues.doPut(OIndexMultiValues.java:177)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.applyTxChanges(OAbstractPaginatedStorage.java:2584)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commitIndexes(OAbstractPaginatedStorage.java:2569)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2493)
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.commit(OAbstractPaginatedStorage.java:2309)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentEmbedded.internalCommit(ODatabaseDocumentEmbedded.java:1953)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.doCommit(OTransactionOptimistic.java:651)
	at com.orientechnologies.orient.core.tx.OTransactionOptimistic.commit(OTransactionOptimistic.java:116)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1592)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentAbstract.commit(ODatabaseDocumentAbstract.java:1562)
	... 16 more
Caused by: java.lang.NullPointerException
	at com.orientechnologies.orient.core.storage.impl.local.paginated.base.ODurablePage.<init>(ODurablePage.java:75)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueBucketV3.<init>(CellBTreeSingleValueBucketV3.java:58)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.allocateNewPage(CellBTreeSingleValueV3.java:1558)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitNonRootBucket(CellBTreeSingleValueV3.java:1453)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.splitBucket(CellBTreeSingleValueV3.java:1416)
	at com.orientechnologies.orient.core.storage.index.sbtree.singlevalue.v3.CellBTreeSingleValueV3.lambda$update$1(CellBTreeSingleValueV3.java:323)
	at com.orientechnologies.orient.core.storage.impl.local.paginated.atomicoperations.OAtomicOperationsManager.calculateInsideComponentOperation(OAtomicOperationsManager.java:221)
	... 34 more

scotthoye avatar Oct 10 '24 17:10 scotthoye

Hi,

This looks some issues in the storage logic, I do remember that some fixes in the storage logic have been done in some early releases of 3.2.x so I do suggest to update to the last hotfix, also it seems that the issues are mostly around the OSystem database, that if you are not using some advanced features like auditing can be removed safely and it will be recreated, let me know if you still have problem with the newer version of the OrientDB

tglman avatar Oct 16 '24 12:10 tglman

@tglman if the observation is just with System DB(OSystem), the deleting it and restarting orientdb service recreates the system DB. However, in my case this issue is observed with not just with system DB (OSystem), but also with application DB as well..

suneelkumarch avatar Oct 22 '24 16:10 suneelkumarch

Thank you for the comments! A few more details/questions:

  • We originally saw this NullPointerException when using version 3.2.23. After upgrading to 3.2.32 (only two away from the latest release), we are still seeing the exceptions. We briefly tested with version 3.2.30 in between, and we did not observe the exception in that version, but it's possible that we didn't give it enough runtime to know for sure.
  • We have tried deleting everything (more than just OSystem) and starting fresh by re-populating the database, and things will work fine for a few days. But eventually the problem occurs again after a few days of runtime for no known reason. The operations performed against the database are consistent in our test scenario, so we haven't determined a sequence of operations that leads to the exception state (it seems to happen randomly).
  • If we're not using the advanced features of OSystem, is there any way to disable it to avoid this NullPointerException? Is OSystem necessary?

Thanks again!

scotthoye avatar Oct 22 '24 17:10 scotthoye

Hi,

Are you using volumes for the data folder in your k8s deploy ?

Regards

tglman avatar Nov 04 '24 20:11 tglman

Thanks, @tglman. I can't speak for the original reporter, but in my case, I'm not using K8s at all. So no volumes for the data folder. Just running OrientDb locally on a Windows PC with only local access.

scotthoye avatar Nov 04 '24 20:11 scotthoye

@tglman Yes, in my case, I am using k8s volumes.

suneelkumarch avatar Nov 05 '24 16:11 suneelkumarch

Hi,

Are you using volumes for the data folder in your k8s deploy ?

Regards

@tglman using k8s volumes has any observations with orientdb, is it?

suneelkumarch avatar Nov 12 '24 12:11 suneelkumarch

Hi @suneelkumarch,

We do suggest to use volumes, because containers filesystems are not designed for databases, so if you are using volumes you are doing the right way.

I was double checking this, if you are using volumes this errors should come from some other places, did you have this issues after an upgrade or you get this errors also when using the database with the exact same version of OrientDB of which it was created ?

tglman avatar Nov 12 '24 19:11 tglman

Hi @tglman, we get there errors on the same version of the orientdb, NO upgrades were performed. This observation is seen after the system was restarted(not a graceful shutdown), where orientdb is running as Pod.

suneelkumarch avatar Nov 12 '24 21:11 suneelkumarch