ozone HDDS-7284. JVM crash for rocksdb for read/write after close

What changes were proposed in this pull request?

Added check for close of DB before db access, And a counter if any operation in progress This counter is used for close to make sure operation is closed, and max wait for 5 sec for force close as strategy.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-7284

How was this patch tested?

A test is performed to check if IO Exception after close is comming, as earlier it was causing Java crash,

Oct 06 '22 06:10 sumitagrawl

@nandakumar131 Please review

Oct 06 '22 06:10 sumitagrawl

@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue.

Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.

Oct 10 '22 05:10 ChenSammi

@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue.

Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.

As discussed,

Solution 1 is done to avoid the crash of integration test, till graceful shutdown is properly implemented and without performance impact.
Graceful shutdown for Recon needs to be created but its low priority one

Oct 11 '22 04:10 sumitagrawl

@sumitagrawl , thanks for reporting and fixing this issue. Since JVM is crashed for there is still read/write after close, which means there is some issue with current Recon code, threads that read/write rocksdb should be stopped before rocksdb close. Can we try to do that first? We should guarantee close RocksDB at the end, than adding a layer of mechanism to mitigate the code issue. Regarding two solutions, I would prefer solution two, because you cannot make sure how long the wait time is enough for solution one.

As discussed,

Solution 1 is done to avoid the crash of integration test, till graceful shutdown is properly implemented and without performance impact.

Graceful shutdown for Recon needs to be created but its low priority one

To avoid jvm dump, infinite wait is added for close till all usages are completed.

Oct 12 '22 13:10 sumitagrawl

Thanks @sumitagrawl for the fix.

Oct 25 '22 08:10 nandakumar131

ozone ozone copied to clipboard

HDDS-7284. JVM crash for rocksdb for read/write after close

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

ozone
ozone copied to clipboard