zookeeper icon indicating copy to clipboard operation
zookeeper copied to clipboard

ZOOKEEPER-4475: Fix NodeChildrenChanged delivered to recursive watcher

Open kezhuw opened this issue 3 years ago • 10 comments

Cause:

  • Events are delivered to recursive watcher unconditionally.
  • Client could receive NodeChildrenChanged due to child watcher on recursive watching node's descendants.

Changes: Filter out NodeChildrenChanged events for recursive watcher in client side.

kezhuw avatar Feb 22 '22 11:02 kezhuw

Ping @eolivelli @Randgalt for review.

kezhuw avatar Feb 25 '22 14:02 kezhuw

Another ping @eolivelli @Randgalt @symat @maoling for review.

kezhuw avatar Mar 09 '22 03:03 kezhuw

@eolivelli @Randgalt @symat @maoling @arshadmohammad @anmolnar Anyone have time capacity to review this pr ? I think @Randgalt has confirmed this as bug in ZOOKEEPER-4466.

kezhuw avatar Apr 06 '22 02:04 kezhuw

Some other concerns:

  1. The description in this patch is different from JIRA's content:

This requires server to never deliver NodeChildrenChanged for node's descendants.

In this patch we filter event on the client side, which has nothing to do with server side logics.

  1. Do you have a real world workload needs this patch? I'm glad to see a concrete use case for PersistentRecursiveWatch and how the behavior changed here affects the application logic.

tisonkun avatar Jun 28 '22 16:06 tisonkun

@tisonkun Thank you for reviews. I have pushed fixup commit to resolve your comments. Regarding your concerns, I try to reply them here:

  1. The next sentence after the quoted one is "It can not be true.". From server's perspective, it is valid to attach standard child watches on descendants of node which is being watched in persistent recursive mode. So, basically, server is innocent to this. It has to be solved in client side. I explains this in code comments in fixup commit also.
  2. No. I observed this in writing tests for zookeeper-client-rust. In the first place, I observed quirk behaviors in situations where standard watches and persistent watches are attached on overlapping paths(ZOOKEEPER-4466). After discussion with @Randgalt in ZOOKEEPER-4466, I split out ZOOKEEPER-4475 as a bug report and ZOOKEEPER-4466 remains more like a feature request.

kezhuw avatar Jun 29 '22 03:06 kezhuw

@tisonkun would you have time to test if Curator still works well with this change ? like running all the tests of Curator

eolivelli avatar Jun 30 '22 13:06 eolivelli

I'm glad to see a concrete use case for PersistentRecursiveWatch @tisonkun for instance Apache Pulsar relies heavily on PersistentRecursiveWatch since version 2.8.0 you can take a look to Pulsar code

eolivelli avatar Jun 30 '22 13:06 eolivelli

@eolivelli I'll try to do a local test this week. Ping me if I miss it :)

tisonkun avatar Jun 30 '22 15:06 tisonkun

@eolivelli since Curator doesn't support ZK 3.7+, I'm trying to pick this commit to 3.6.4-SNAPSHOT and run tests. However, I get a compile error when build with 3.6.4:

[ERROR] Bundle org.apache.zookeeper:zookeeper-jute:jar:3.6.4-SNAPSHOT : The default package '.' is not permitted by the Import-Package syntax.
 This can be caused by compile errors in Eclipse because Eclipse creates
valid class files regardless of compile errors.
The following package(s) import from the default package [org.apache.zookeeper.data, org.apache.zookeeper.proto, org.apache.zookeeper.server.persistence, org.apache.zookeeper.server.quorum, org.apache.zookeeper.txn]

I have no idea whether this failure comes from.

tisonkun avatar Jun 30 '22 15:06 tisonkun

Hi @eolivelli, I have done a test based on #1859(which is a superset of this pr) and apache/curator#426 on branch-3.7. Follow is the verify progress copied from my shell. From my observation, there are flakes, but not failures.

PS: How the community share test report ? (I have deleted the test report from this reply as it is too long. You could see it from initial edit.)

kezhuw avatar Jul 04 '22 00:07 kezhuw

:warning: 52 God Classes were detected by Lift in this project. Visit the Lift web console for more details.

sonatype-lift[bot] avatar Oct 10 '22 01:10 sonatype-lift[bot]

cc @eolivelli @Randgalt @symat @maoling @arshadmohammad @anmolnar for reviews.

kezhuw avatar Oct 17 '22 14:10 kezhuw

cc @eolivelli @Randgalt @symat @maoling @arshadmohammad @anmolnar @ztzg for reviews.

kezhuw avatar Feb 22 '23 01:02 kezhuw

Pending to merge if no more objection this week.

tisonkun avatar Feb 22 '23 09:02 tisonkun

@tisonkun in ZooKeeper we do not allow to merge this kind of changes if there are not at least 2 sponsors.

Unfortunately currently there are not many active reviewers, but please always wait for a second approval before merging non trivial patches

eolivelli avatar Feb 22 '23 09:02 eolivelli

@eolivelli OK.

tisonkun avatar Feb 22 '23 09:02 tisonkun

@eolivelli I agree that it changes the event delivered and am OK to deliver it on 3.9.0, while it can be regarded as a bug fix since in 3.6.0 release notes we write:

Note that NodeChildrenChanged events are not triggered for persistent recursive watches as it would be redundant.

tisonkun avatar Feb 22 '23 09:02 tisonkun