graphene-sqlalchemy icon indicating copy to clipboard operation
graphene-sqlalchemy copied to clipboard

Add Filters

Open sabard opened this issue 2 years ago • 1 comments

Supersedes #356

sabard avatar Aug 11 '22 14:08 sabard

Codecov Report

Attention: 34 lines in your changes are missing coverage. Please review.

Comparison is base (1436807) 96.46% compared to head (e698d7c) 94.74%. Report is 1 commits behind head on master.

Files Patch % Lines
graphene_sqlalchemy/registry.py 80.88% 13 Missing :warning:
graphene_sqlalchemy/filters.py 95.25% 11 Missing :warning:
graphene_sqlalchemy/types.py 89.53% 9 Missing :warning:
graphene_sqlalchemy/converter.py 85.71% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #357      +/-   ##
==========================================
- Coverage   96.46%   94.74%   -1.72%     
==========================================
  Files           9       10       +1     
  Lines         933     1333     +400     
==========================================
+ Hits          900     1263     +363     
- Misses         33       70      +37     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Aug 11 '22 14:08 codecov-commenter

I know this is out of scope for this PR, but it would be good to add a short doc on the use of BatchSQLAlchemyConnectionField with a brief comment. It's only documented in the test files, but should also be included in the documentation.

palisadoes avatar Feb 28 '23 02:02 palisadoes

I tested the filters on our app and database and here are my first findings/questions:

  • "contains_exactly" immediately runs in out of memory: (MySQLdb._exceptions.OperationalError) (1038, 'Out of sort memory, consider increasing server sort buffer size')\n[SQL: SELECT count(*) ... This query would only return 28 entities
  • How would I the DateFilter for my datetime/date columns? Seems like the DateFilter is available but not registered
  • "and"/"or" are of type [String]
  • Probably caused by a different change but I will add it here so we can discuss it in the next call: I have a JSONType column that was returning the parsed json before, but now I get the json as a string. I register the column like this (this approach has probably changed)
@convert_sqlalchemy_type.register(JSONType)
def convert_column_to_json(*args, **kwargs):
    return graphene.types.generic.GenericScalar

PaulSchweizer avatar Mar 28 '23 07:03 PaulSchweizer

This PR has been our only blocker for upgrading for quite some time - how can we help move this forward?

forana avatar Apr 06 '23 11:04 forana

@forana I'm moving the contains_exactly implementation to a different PR so that the changes here can be merged in ASAP.

The main issue I'm still facing now that tests run both sync and async is differing async behavior when filtering over a relationship. As you can see in this failing test, the async tests is returning duplicate results.

@erikwrede or @PaulSchweizer do you all have any ideas about why this could be happening? I haven't had a chance to play around with async yet.

@PaulSchweizer will take a look at yourDateFilter and and/or comments before the call today.

sabard avatar Apr 10 '23 17:04 sabard

@jendrikjoe ^

sabard avatar Apr 10 '23 19:04 sabard

I don't have an example for this yet, but noticing that a custom filter with val typed as Dict or Any somehow ends up attempting to convert to a datetime and hitting a real cryptic error in conversion.

forana avatar Apr 28 '23 16:04 forana

  1. @erikwrede @sabard @PaulSchweizer isn't this what we discussed in our call earlier this week?
  2. @forana There should be a fix for this soon.

palisadoes avatar Apr 28 '23 16:04 palisadoes

I'm realizing that graphene-sqlalchemy-filters handled pagination, while this does not - any guidance for a recommended strategy for implementing pagination using these filters?

forana avatar May 09 '23 13:05 forana

I'm realizing that graphene-sqlalchemy-filters handled pagination, while this does not - any guidance for a recommended strategy for implementing pagination using these filters?

Why would pagination not work if we're just attaching clauses to the SQLAlchemy query?

polgfred avatar May 11 '23 15:05 polgfred

I'm realizing that graphene-sqlalchemy-filters handled pagination, while this does not - any guidance for a recommended strategy for implementing pagination using these filters?

Why would pagination not work if we're just attaching clauses to the SQLAlchemy query?

Not so much "this breaks pagination" as "this doesn't provide it, and I've realized I'll need to handle it" - I had been (incorrectly) thinking of this PR as a complete replacement for graphene-sqlalchemy-filters (and to be clear, I don't think it needs to be - just looking for recommendations/pointers before I roll my own pagination).

forana avatar May 11 '23 17:05 forana

Pagination via offset and limit has to be implemented manually at this point, it could look something like this (going off of my memory):

class PaginationConnectionField(graphene_sqlalchemy.SQLAlchemyConnectionField):
    def __init__(self,  *args, **kwargs):
        kwargs["page"] = graphene.Int()
        kwargs["itemsPerPage"] = graphene.Int()
        super().__init__(*args, **kwargs)

    @classmethod
    def get_query(cls, *args, **kwargs):
        query = SQLAlchemyConnectionField.get_query(*args, **kwargs)

        if (
            kwargs.get("page", None) is not None
            and kwargs.get("itemsPerPage", None) is not None
        ):
            query = query.offset(kwargs["page"] * kwargs["itemsPerPage"])
            query = query.limit(kwargs["itemsPerPage"])

        return query

We already briefly discussed implementing this (or a version of this) in the library, but there is no concrete plan yet.

PaulSchweizer avatar May 12 '23 10:05 PaulSchweizer

We looked into the filters a bit more in depth now and came across this issue/question regarding filtering across relationships:

The new implementation uses INNERJOINs for the filters, graphene_sqlalchemy_filters on the other hand uses EXISTS statements to filter across relationships.

The graphql syntax of the two filter systems is basically the same.

This lead to some initial confusion on our end since we were getting different results when translating our existing queries one to one into the new filter syntax.

I prepared an example to showcase the "issue", unfortunately I had no time yet to put it up somewhere but I can showcase it in a call.

In the example we have Teams, which have a number of Tasks wich in turn have a number of Todos. Users are then assigned to the Todos (Teams -> Tasks -> Todos -> User).

This is the filter, translated 1:1 from the graphene_sqlalchemy_filters syntax:

query {
  teams(
    filter: {
      tasks: {
        contains: {
          and: [
            { name: { eq: "Accounting" } }
            { todos: { contains: { assignee_id: { eq: 2 } } } }
          ]
        }
      }
    }
  )

I would expect to get:

All teams that have an 'Accounting' task that contains a Todo where a user with id 2 is assigned.

Instead we get:

All teams that EITHER have an 'Accounting' task OR a Task with a Todo where a user with id 2 is assigned.

The "correct" query to get the initially expected result with the "new filters" would look like this:

query {
  teams(
    filter: {
      and: [
        {
          tasks: {
            contains: { name: { eq: "Accounting" } }
          }
        },
         {
          tasks: {
            contains: { todos: { contains: { assignee_id: { eq: 2 } } } }
          }
        }
      ]
    }
  )

BUT this one also does not work because each occurence of "tasks" gets a unique alias, so we again end up with the wrong result. The resulting SQL statement:

SELECT DISTINCT 
teams.id AS teams_id, 
FROM teams 
INNER JOIN tasks AS tasks_1 ON teams.id = tasks_1.shot_id 
INNER JOIN tasks AS tasks_2 ON teams.id = tasks_2.shot_id INNER JOIN todos AS todos_1 ON tasks_2.id = todos_1.task_id 

Should look like this:

SELECT DISTINCT 
teams.id AS teams_id, 
FROM teams 
INNER JOIN tasks AS tasks_1 ON teams.id = tasks_1.shot_id 
INNER JOIN todos AS todos_1 ON tasks_1.id = todos_1.task_id 

The solution here would probably be to use the same alias on each "level" of relationship.

This however highlights the question: Which behavior do we want to support? INNERJOINS or EXISTS? We talked about it in my team and came to the conclusion that the grapqhl filter syntax would rather suggest the EXISTS behavior over the INNERJOIN behaviour. Let me know what you think, probably best to jump on a call and discuss it in person (I would be free this coming Monday, May 15th)

PaulSchweizer avatar May 12 '23 12:05 PaulSchweizer

@sabard @erikwrede When do you think you'll be able to merge the filters into the main branch? Is it just the Enum work that is outstanding? I really need to migrate off the old graphene-sqlalchemy-filter.

palisadoes avatar Jul 27 '23 17:07 palisadoes

@palisadoes just fixed the enum filter case. AFAIK that covers our base scope, so what's missing now is a refactor and cleanup of the code, as it still contains many special cases from previous prototypes that can be generalized to improve code quality and readability.

erikwrede avatar Jul 28 '23 17:07 erikwrede

thanks for figuring out the issue with enums @erikwrede!

@palisadoes i've been pretty swamped this month unfortunately. hope to have more time to work on this in august

sabard avatar Jul 31 '23 18:07 sabard

@erikwrede Sorry about missing the meeting. Are there any more outstanding PRs for the filtering?

palisadoes avatar Dec 04 '23 20:12 palisadoes