curator icon indicating copy to clipboard operation
curator copied to clipboard

CURATOR-157 Avoid stack traces on close

Open bdumon opened this issue 9 years ago • 12 comments

Avoid error-level stack traces closing PathChildrenCache followed by closing CuratorFramework

See details in jira: https://issues.apache.org/jira/browse/CURATOR-157

bdumon avatar Oct 30 '14 11:10 bdumon

Looks like I neglected to respond to this for a while. Here is why I think the logic is faulty.

T2: processResult() method entry T2: synchronized (backgroundTaskMonitor) acquire lock T2: if (state.get().equals(State.CLOSED)) { } evaluates false T1: close() method entry T1: if ( state.compareAndSet(State.STARTED, State.CLOSED) ) evaluates true T1: listeners.clear() T1: client.getConnectionStateListenable().removeListener(connectionStateListener) T1 suspends waiting for the lock T2: processBackgroundResult(event) gets called on a closed connection

I think the solution is to put all of the close() method into the synchronized block, but that is not a super appealing answer. Maybe it will be fine though.

The other concern is that now we can only process a single background event at a time, which seems like it could be a pretty high price to pay.

madrob avatar Feb 06 '15 16:02 madrob

I follow your reasoning, until the last event:

T2: processBackgroundResult(event) gets called on a closed connection

There is no connection closed by NodeCache or PathChildrenCache. The point of the patch is that after NodeCache/PathChildrenCache.close() returns, they won't be making use of the ZK connection anymore, so that CuratorFramework.close() can be called without causing stacktraces.

bdumon avatar Feb 16 '15 08:02 bdumon

I don't think it matters whether or not they actually use the ZK Connection at that point. The important part is that getState() == State.CLOSED (as set by T1) and then when the processBackgroundResult call happens it will check for State.STARTED, which it will be false.

madrob avatar Feb 17 '15 16:02 madrob

Hmm, I don't see any such check? Is this in NodeCache or PathChildrenCache?

bdumon avatar Feb 18 '15 08:02 bdumon

In the JIRA issue, the example trace came from GetChildrenBuilderImpl line 166.

madrob avatar Feb 18 '15 16:02 madrob

In those stack traces, it is CuratorFramework/CuratorZookeeperClient which check if their state is started (which is stil true, since we have not closed them, we have closed the cache), not NodeCache/PathChildrenCache.

Or are you referring to something else?

bdumon avatar Feb 19 '15 11:02 bdumon

Ok, I think I got myself confused between the two close operations.

Can you add a test (probably with an artificial delay to ensure ordering) that would fail before your patch and pass after to make sure that we properly handle this? Sorry to push back on this, but I'm not sure anymore of my own ability to understand the code 100%.

madrob avatar Feb 19 '15 16:02 madrob

For PathChildrenCache, there was an existing test added as part of https://issues.apache.org/jira/browse/CURATOR-141, I've modified it a bit so that it is more likely to reproduce the problem. I also added a similar test for NodeCache.

bdumon avatar Feb 23 '15 11:02 bdumon

What's the status on this pull request? It looks like @bdumon added a test. Is the issue that there are branch conflicts? Does the error not happen in 3.0? I am getting it on 2.9.1 if I use a try block that relies on the Closable interface.

hdeadman avatar Jan 19 '16 23:01 hdeadman

@madrob Are you able to take a look at this?

cammckenzie avatar Jan 19 '16 23:01 cammckenzie

Jordan Zimmerman on [email protected] replies: Hey Mike,

Can you take a look at this?

-Jordan on close your feature please ticket

asfbot avatar Jan 19 '16 23:01 asfbot

Has this issue been fixed?

menghaoranss avatar Aug 31 '21 09:08 menghaoranss

Closed as stale. There're many changes since this patch was made in the first place. Feel free to resubmit it if it's still relevant.

BTW, NodeCache and PathChildrenCache are deprecated now, you can use CuratorCache instead to see if this is still an issue.

tisonkun avatar Sep 06 '22 13:09 tisonkun