linstor-server
linstor-server copied to clipboard
Unknown error during loading data from DB
Today our production cluster stopped working:
LINSTOR, Module Controller
Version: 1.13.0 (37c02e20aa52f26ef28ce4464925d9e53327171c)
Build time: 2021-06-25T13:20:41+00:00
Java Version: 11
Java VM: Debian, Version 11.0.11+9-post-Debian-1deb10u1
Operating system: Linux, Version 5.4.0-74-generic
Environment: amd64, 1 processors, 15515 MiB memory reserved for allocations
System components initialization in progress
11:36:18.231 [main] INFO LINSTOR/Controller - SYSTEM - ErrorReporter DB version 1 found.
11:36:18.232 [main] INFO LINSTOR/Controller - SYSTEM - Log directory set to: '/logs'
11:36:18.251 [main] WARN io.sentry.dsn.Dsn - *** Couldn't find a suitable DSN, Sentry operations will do nothing! See documentation: https://docs.sentry.io/clients/java/ ***
11:36:18.261 [Main] INFO LINSTOR/Controller - SYSTEM - Loading API classes started.
11:36:18.456 [Main] INFO LINSTOR/Controller - SYSTEM - API classes loading finished: 195ms
11:36:18.456 [Main] INFO LINSTOR/Controller - SYSTEM - Dependency injection started.
11:36:18.464 [Main] INFO LINSTOR/Controller - SYSTEM - Attempting dynamic load of extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule"
11:36:18.464 [Main] INFO LINSTOR/Controller - SYSTEM - Extension module "com.linbit.linstor.spacetracking.ControllerSpaceTrackingModule" is not installed
11:36:19.009 [Main] INFO LINSTOR/Controller - SYSTEM - Dependency injection finished: 553ms
11:36:19.192 [Main] INFO LINSTOR/Controller - SYSTEM - Initializing authentication subsystem
11:36:19.272 [Main] INFO LINSTOR/Controller - SYSTEM - Starting service instance 'TimerEventService' of type TimerEventService
11:36:19.272 [Main] INFO LINSTOR/Controller - SYSTEM - Initializing the database connection pool
11:36:19.349 [Main] WARN org.flywaydb.core.api.Location - Use of dots (.) as path separators will be deprecated in Flyway 7. Path: com.linbit.linstor.dbcp.migration
11:36:19.352 [Main] INFO org.flywaydb.core.internal.license.VersionPrinter - Flyway Community Edition 6.5.7 by Redgate
11:36:19.393 [Main] INFO org.flywaydb.core.internal.database.DatabaseFactory - Database: jdbc:postgresql://linstor-db-stolon-proxy/linstor (PostgreSQL 10.12)
11:36:19.435 [Main] INFO org.flywaydb.core.internal.command.DbValidate - Successfully validated 59 migrations (execution time 00:00.024s)
11:36:19.445 [Main] INFO org.flywaydb.core.internal.command.DbMigrate - Current version of schema "LINSTOR": 2021.05.26.09.00
11:36:19.445 [Main] WARN org.flywaydb.core.internal.command.DbMigrate - outOfOrder mode is active. Migration of schema "LINSTOR" may not be reproducible.
11:36:19.446 [Main] INFO org.flywaydb.core.internal.command.DbMigrate - Schema "LINSTOR" is up to date. No migration necessary.
11:36:19.455 [Main] INFO LINSTOR/Controller - SYSTEM - Starting service instance 'DatabaseService' of type DatabaseService
11:36:19.455 [Main] INFO LINSTOR/Controller - SYSTEM - Loading security objects
11:36:19.486 [Main] INFO LINSTOR/Controller - SYSTEM - Current security level is NO_SECURITY
11:36:19.523 [Main] INFO LINSTOR/Controller - SYSTEM - Core objects load from database is in progress
11:36:35.575 [Main] ERROR LINSTOR/Controller - SYSTEM - Unknown error during loading data from DB [Report number 60F16F31-00000-000000]
11:36:35.580 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutdown in progress
11:36:35.581 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutting down service instance 'TaskScheduleService' of type TaskScheduleService
11:36:35.581 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Waiting for service instance 'TaskScheduleService' to complete shutdown
11:36:35.581 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutting down service instance 'DatabaseService' of type DatabaseService
11:36:35.582 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Waiting for service instance 'DatabaseService' to complete shutdown
11:36:35.582 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutting down service instance 'TimerEventService' of type TimerEventService
11:36:35.582 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Waiting for service instance 'TimerEventService' to complete shutdown
11:36:35.582 [Thread-2] INFO LINSTOR/Controller - SYSTEM - Shutdown complete
ERROR REPORT 60F16F31-00000-000000
============================================================
Application: LINBIT? LINSTOR
Module: Controller
Version: 1.13.0
Build ID: 37c02e20aa52f26ef28ce4464925d9e53327171c
Build time: 2021-06-25T13:20:41+00:00
Error time: 2021-07-16 11:36:35
Node: linstor-controller-656bbb99db-2k4vh
============================================================
Reported error:
===============
Category: Error
Class name: ImplementationError
Class canonical name: com.linbit.ImplementationError
Generated at: Method 'loadAll', Source file 'DatabaseLoader.java', Line #604
Error message: Unknown error during loading data from DB
Call backtrace:
Method Native Class:Line number
loadAll N com.linbit.linstor.dbdrivers.DatabaseLoader:604
loadCoreObjects N com.linbit.linstor.core.DbDataInitializer:176
initialize N com.linbit.linstor.core.DbDataInitializer:108
startSystemServices N com.linbit.linstor.core.ApplicationLifecycleManager:87
start N com.linbit.linstor.core.Controller:343
main N com.linbit.linstor.core.Controller:564
Caused by:
==========
Description:
Failed to restore data
Category: LinStorException
Class name: DatabaseException
Class canonical name: com.linbit.linstor.dbdrivers.DatabaseException
Generated at: Method 'loadAll', Source file 'AbsDatabaseDriver.java', Line #109
Error message: Failed to restore data
Call backtrace:
Method Native Class:Line number
loadAll N com.linbit.linstor.dbdrivers.AbsDatabaseDriver:109
loadAll N com.linbit.linstor.dbdrivers.DatabaseLoader:408
loadCoreObjects N com.linbit.linstor.core.DbDataInitializer:176
initialize N com.linbit.linstor.core.DbDataInitializer:108
startSystemServices N com.linbit.linstor.core.ApplicationLifecycleManager:87
start N com.linbit.linstor.core.Controller:343
main N com.linbit.linstor.core.Controller:564
Caused by:
==========
Category: Exception
Class name: MaxSizeException
Class canonical name: com.linbit.drbd.md.MaxSizeException
Generated at: Method 'checkVolumeSize', Source file 'VolumeDefinition.java', Line #177
Error message: Volume size value 1181116006400000000 is out of range [4 - 1099511627776]
Call backtrace:
Method Native Class:Line number
checkVolumeSize N com.linbit.linstor.core.objects.VolumeDefinition:177
<init> N com.linbit.linstor.core.objects.VolumeDefinition:112
load N com.linbit.linstor.core.objects.VolumeDefinitionDbDriver:144
load N com.linbit.linstor.core.objects.VolumeDefinitionDbDriver:43
restoreData N com.linbit.linstor.dbdrivers.sql.SQLEngine:345
loadAll N com.linbit.linstor.dbdrivers.sql.SQLEngine:279
loadAll N com.linbit.linstor.dbdrivers.AbsDatabaseDriver:100
loadAll N com.linbit.linstor.dbdrivers.DatabaseLoader:408
loadCoreObjects N com.linbit.linstor.core.DbDataInitializer:176
initialize N com.linbit.linstor.core.DbDataInitializer:108
startSystemServices N com.linbit.linstor.core.ApplicationLifecycleManager:87
start N com.linbit.linstor.core.Controller:343
main N com.linbit.linstor.core.Controller:564
END OF ERROR REPORT.
The problem was caused by setting incredibly large number for this volume in Kubernetes:
pvc-23821312-c087-4cfd-834c-fc5403cf435e 1209462790553600m RWO Delete Bound hosting/vol-data-web-hc1-wd17-1 linstor-hosting-2 541d
┊ m16c11 ┊ pvc-23821312-c087-4cfd-834c-fc5403cf435e ┊ thindata ┊ 0 ┊ 2926 ┊ /dev/drbd2926 ┊ 788.96 GiB ┊ Unused ┊ Resizing, UpToDate ┊
┊ m6c10 ┊ pvc-23821312-c087-4cfd-834c-fc5403cf435e ┊ thindata ┊ 0 ┊ 2926 ┊ /dev/drbd2926 ┊ 1.00 TiB ┊ ┊ Unknown ┊
solved by manually correcting the number in the "LINSTOR".volume_definitions table
Still a bug, because that volume size should not have been able to go into the database in the first place. DRBD volumes are limited in size to 1 PiB. The specified size was about 1049 EiB, which should still be well within LINSTOR's internal handling capabilities for volume sizes.