ozone
ozone copied to clipboard
HDDS-7284. JVM crash for rocksdb for read/write after close
What changes were proposed in this pull request?
Added check for close of DB before db access, And a counter if any operation in progress This counter is used for close to make sure operation is closed, and max wait for 5 sec for force close as strategy.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-7284
How was this patch tested?
A test is performed to check if IO Exception after close is comming, as earlier it was causing Java crash,
@nandakumar131 Please review
@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue.
Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.
@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue.
Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.
As discussed,
- Solution 1 is done to avoid the crash of integration test, till graceful shutdown is properly implemented and without performance impact.
- Graceful shutdown for Recon needs to be created but its low priority one
@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue. Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.
As discussed,
- Solution 1 is done to avoid the crash of integration test, till graceful shutdown is properly implemented and without performance impact.
- Graceful shutdown for Recon needs to be created but its low priority one
To avoid jvm dump, infinite wait is added for close till all usages are completed.
Thanks @sumitagrawl for the fix.