accumulo Only call getData in ZooCache

While looking at #1422 I noticed that zoocache calls exists() and then getData(). I think it may be able to only call getData() and catch NoNodeException to know something does not exists.

https://github.com/apache/accumulo/blob/6b54a5d89aa10103105db1a6c6c88903fcaa90e6/core/src/main/java/org/apache/accumulo/fate/zookeeper/ZooCache.java#L408

Nov 12 '19 19:11 keith-turner

Hello, I would like to work on this if it's still available. Thanks!

Nov 13 '19 23:11 acastles24

Sure @Jman24 . Let me know if you have any questions.

Nov 14 '19 00:11 keith-turner

It looks to me like the exists() method must be called before getData() since getData() in ZooKeeper requires a Stat variable which is generated by exists(). Am I interpreting this correctly? If so can Stat used by getData() be null?

Nov 16 '19 16:11 acastles24

Read the comment above the call to exists(). It explains why it is doing this.

/* * The following call to exists() is important, since we are caching that a node does not * exist. Once the node comes into existence, it will be added to the cache. But this * notification of a node coming into existence will only be given if exists() was * previously called. If the call to exists() is bypassed and only getData() is called with * a special case that looks for Code.NONODE in the KeeperException, then non-existence can * not be cached. */

Nov 16 '19 21:11 jzgithub1

Read the comment above the call to exists(). It explains why it is doing this.

Good catch, I think I wrote that comment a very long time ago. Thinking something like the following may avoid the two calls to zookeeper when something exists.

        cacheWriteLock.lock();
        try {
          final ZooKeeper zooKeeper = getZooKeeper();
          Stat stat;
          byte[] data = null;
         
          while(true){
            try {
              stat = new Stat();
              data = zooKeeper.getData(zPath, watcher, stat);
              zstat = new ZcStat(stat);
              break;
            } catch (KeeperException.NoNodeException e1) {
              // node did not exists, following call registers a watcher to trigger when it comes into existance
              stat = zooKeeper.exists(zPath, watcher);
              if(stat != null) {
                //came into existance after call to get data... retry whole thing
              } else {
                break;
              }
            }
          }
          
          put(zPath, data, zstat);
          copyStats(status, zstat);
          return data;
        } finally {
          cacheWriteLock.unlock();
        }

Would still be two calls to zookeeper when it does not exists. When I opened the issue I was thinking the exists case would be more prevalent, but now I am not sure.

Nov 16 '19 22:11 keith-turner

OK I see, thanks for the explanation. Should it just be left as is then?

Nov 17 '19 23:11 acastles24

I am inclined to think it should be left as it. Let @keith-turner give you direction on how to proceed. Thank you for your investigation of this issue.

Nov 18 '19 01:11 jzgithub1

Should it just be left as is then?

I did a quick survey of the code to see how Accumulo is using node data from zoocache. I found the code is using it to obtain the following information.

servers locks and locations
tables name to id mapping
root tablet metadata
accumulo instance id
accumulo user information
table properties

For all of the above categories, except table properties, I think the expectation is that the data exists in ZK. Therefore for all uses cases, except table properties, I suspect the approach of calling zk.getData first then zk.exists if needed makes sense and would result in less overall RPCs to ZK.
However I don't know the proportion of calls to getData for non table props vs table props.

How table props are cached is undergoing major changes in #1422. Since the relevance of this change hinges on table properties I think it would make sense to wait until after #1422 is complete and reassess.

Nov 18 '19 15:11 keith-turner

OK sounds good, thanks for the help.

Nov 19 '19 03:11 acastles24

With the prop store changes complete, properties are no longer stored in ZooCache - I'll take a look at this now.

Jul 07 '22 21:07 EdColeman

@EdColeman any update on this?

Aug 05 '22 15:08 ctubbsii

Currently looking at other higher priority things (unless the priority of this changed?) - still intend to evaluate this wrt the prop changes that should have reduced ZooCache usage

Aug 05 '22 15:08 EdColeman

accumulo accumulo copied to clipboard

Only call getData in ZooCache

accumulo
accumulo copied to clipboard