geode icon indicating copy to clipboard operation
geode copied to clipboard

GEODE-9602: QueryObserver improvements.

Open albertogpz opened this issue 3 years ago • 9 comments

  • Make QueryObserverHolder thread-safe
  • Allow having an observer per query by means of setting the observer in the query at the start of the execution.
  • Invoke beforeIterationEvaluation and afterIterationEvaluation callbacks when query is using indexes.

For all changes:

  • [ ] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • [ ] Has your PR been rebased against the latest commit within the target branch (typically develop)?

  • [ ] Is your initial contribution a single, squashed commit?

  • [ ] Does gradlew build run cleanly?

  • [ ] Have you written or updated unit tests to verify your changes?

  • [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

albertogpz avatar Sep 16 '21 06:09 albertogpz

This pull request fixes 2 alerts when merging d32b31c32e7ab421dccce4d5615ab373477446b2 into d0113fc2eb3a2eedfa1464aab733c7af148d4539 - view on LGTM.com

fixed alerts:

  • 2 for Dereferenced variable may be null

lgtm-com[bot] avatar Sep 16 '21 14:09 lgtm-com[bot]

This pull request introduces 1 alert and fixes 2 when merging 52fe699ad956c0c2cbb8c739b27ce6831c56a310 into 19f55add07d6a652911dc5a5e2116fcf7bf7b2f7 - view on LGTM.com

new alerts:

  • 1 for Inconsistent synchronization of getter and setter

fixed alerts:

  • 2 for Dereferenced variable may be null

lgtm-com[bot] avatar Sep 21 '21 14:09 lgtm-com[bot]

This pull request fixes 2 alerts when merging 96a15bc4e382f049367ecfccf9be58289c01aaf1 into cf1b35c083d526a02851c06f44a6eda78963d16f - view on LGTM.com

fixed alerts:

  • 2 for Dereferenced variable may be null

lgtm-com[bot] avatar Sep 21 '21 15:09 lgtm-com[bot]

Maybe the way to go here is to rewrite QueryObserverHolder as a class that hold a ThreadLocal<QueryObserverHolder> and then have ALL other product classes such as DefaultQuery just use QueryObserverHolder as an API for accessing that ThreadLocal.

kirklund avatar Oct 04 '21 17:10 kirklund

Maybe the way to go here is to rewrite QueryObserverHolder as a class that hold a ThreadLocal<QueryObserverHolder> and then have ALL other product classes such as DefaultQuery just use QueryObserverHolder as an API for accessing that ThreadLocal.

I like that idea. I did an experiment with it but saw many test cases fail: https://github.com/apache/geode/pull/6882 (https://concourse.apachegeode-ci.info/builds/84932)

The problem is that there are test cases that count on setting the observer globally and then running a query from the client. Take a look for example at: SelectStarQueryDUnitTets::testSelectStarQueryForPartitionedRegion. In these cases, you cannot set the thread local observer in the thread that will execute the query because the query is executed remotely from the client.

I have been able to change the test cases in QueryUsingFunctionContextDUnitTest and set the observer in the function executing the queries instead of globally in the servers but this approach is not valid for the test cases previously mentioned.

I am also worried about the use done by the QueryObserver to provide trace information in queries. It is not possible to set a query observer and at the same time have trace information.

We could also go for having two variables in the QueryObserverHolder: a thread local one and the current global one. Setting the instance could set the value in the global one and getting the instance could check if the thread local has a value. If it does, it would return it. Otherwise, the thread local one would be set to the global one value and then be returned. That way, during a query execution, only the first call to getInstance() should require synchronization. What do you think?

albertogpz avatar Oct 05 '21 15:10 albertogpz

@albertogpz @jhuynh1 AcceptorImpl is the class that runs server-side threads to execute requests from clients. When the thread starts, you could check a global in QueryObserverHolder and then set the ThreadLocal on each thread that it starts.

It might even be possible to support multiple QueryObservers and invoke them all for each query.

So here are some suggestions from Jason and me:

Have the test set a global in the server(s). When the server starts a AcceptorImpl thread to process that query, set that same ThreadLocal on that thread.

The query execution then adds the instance of QueryObserver to the ExecutionContext.

Don't forget to call ThreadLocal.remove() after a query completes.

OrderByComparator has a call to QueryObserverHolder.getInstance(). Change that to just fetch the observer from the context field in that class.

It might even be possible to avoid the ThreadLocal by using a unique observer instance per execution context. I envisioned stuffing a unique instance of QueryObserver into the ThreadLocal that is then garbage collected after the query finishes and ThreadLocal.remove has been called.

kirklund avatar Oct 12 '21 18:10 kirklund

To add to Kirk's prior suggestion (some suggestions are probably duplicated as we are currently discussing) The main difference between the two is the use of vs possible removal of the threadlocal

We might be able to avoid the thread local entirely if we add a method to the QueryObserver interface such as createObserver() where it is expected to pass back a unique instance of an observer. Improvements to reuses/pool these can be made to reduce garbage later.

Where we currently do start query, perhaps call this new createObserver() method and stuff it into the execution context. (https://github.com/apache/geode/blob/7dd614387e9f3133de77ff17675d6bf00051276c/geode-core/src/main/java/org/apache/geode/cache/query/internal/DefaultQuery.java#L342)

Wherever we currently do QueryObserverHolder.getInstance(), instead do context.getObserver()

There are a few locations of QueryObserverHolder.getInstance() like OrderbyComparator (https://github.com/apache/geode/blob/7dd614387e9f3133de77ff17675d6bf00051276c/geode-core/src/main/java/org/apache/geode/cache/query/internal/OrderByComparator.java#L116), that have a context present as a class variable that can be used.

The execution context and observer should be gc'd eventually once the execution context is no longer being used. There is probably more I am missing but thought I'd put this down as a suggestion

jhuynh1 avatar Oct 12 '21 19:10 jhuynh1

this PR appears to be abandoned, can it be closed?

onichols-pivotal avatar Feb 05 '22 08:02 onichols-pivotal

this PR appears to be abandoned, can it be closed?

It is not yet abandoned. It is pending from a discussion with @kirklund about it.

albertogpz avatar Feb 07 '22 08:02 albertogpz