ozone icon indicating copy to clipboard operation
ozone copied to clipboard

HDDS-7284. JVM crash for rocksdb for read/write after close

Open sumitagrawl opened this issue 2 years ago • 4 comments

What changes were proposed in this pull request?

Added check for close of DB before db access, And a counter if any operation in progress This counter is used for close to make sure operation is closed, and max wait for 5 sec for force close as strategy.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7284

How was this patch tested?

A test is performed to check if IO Exception after close is comming, as earlier it was causing Java crash,

sumitagrawl avatar Oct 06 '22 06:10 sumitagrawl

@nandakumar131 Please review

sumitagrawl avatar Oct 06 '22 06:10 sumitagrawl

@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue.

Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.

ChenSammi avatar Oct 10 '22 05:10 ChenSammi

@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue.

Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.

As discussed,

  1. Solution 1 is done to avoid the crash of integration test, till graceful shutdown is properly implemented and without performance impact.
  2. Graceful shutdown for Recon needs to be created but its low priority one

sumitagrawl avatar Oct 11 '22 04:10 sumitagrawl

@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue. Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.

As discussed,

  1. Solution 1 is done to avoid the crash of integration test, till graceful shutdown is properly implemented and without performance impact.
  2. Graceful shutdown for Recon needs to be created but its low priority one

To avoid jvm dump, infinite wait is added for close till all usages are completed.

sumitagrawl avatar Oct 12 '22 13:10 sumitagrawl

Thanks @sumitagrawl for the fix.

nandakumar131 avatar Oct 25 '22 08:10 nandakumar131