superset icon indicating copy to clipboard operation
superset copied to clipboard

perf: Implement model specific lookups by id to improve performance

Open bkyryliuk opened this issue 3 years ago • 1 comments

SUMMARY

Currently the access checks are happening 2ce in the explore flow, once during the object lookup and later doing an explicit check. Existing implementation was spending > 500 ms in our deployment on doing object by id lookup and that contributed to ~1 - 1.5 second slowdown of the explore endpoint.

The proposed solution skips the filter on the chart / datasource lookup and let's explore to handle the access checks and return 403 if it was denied. Prior behavior was to return 404.

Out of scope

https://github.com/apache/superset/blob/f0ca158989644b793719884b52d04f93c05de1ba/superset/views/utils.py#L104 function takes > 1 second in our deployment, it is part of the explore flow while bootsrapping user data https://github.com/apache/superset/blob/f0ca158989644b793719884b52d04f93c05de1ba/superset/views/utils.py#L72 I will have separate PR to optimize the permissions lookup.

Before

Screen Shot 2022-08-03 at 1 41 41 PM

After

image Screen Shot 2022-08-04 at 8 46 54 AM

TESTING INSTRUCTIONS

[x] Deployed and tested on dropbox staging

ADDITIONAL INFORMATION

More granular breakdown

@classmethod
def find_by_id(
    cls, model_id: Union[str, int], session: Session = None
) -> Optional[Model]:
    """
    Find a model by id, if defined applies `base_filter`
    """
    with newrelic.agent.FunctionTrace("superset.dao.base.find_by_id.session", group="DAO"):
        session = session or db.session
    with newrelic.agent.FunctionTrace("superset.dao.base.find_by_id.query_contruction", group="DAO"):
        query = session.query(cls.model_cls)
        if cls.base_filter:
            data_model = SQLAInterface(cls.model_cls, session)
            query = cls.base_filter(  # pylint: disable=not-callable
                cls.id_column_name, data_model
            ).apply(query, None)
        id_filter = {cls.id_column_name: model_id}
    try:
        with newrelic.agent.FunctionTrace("superset.dao.base.find_by_id.query_execution", group="DAO"):
            return query.filter_by(**id_filter).one_or_none()
    except StatementError:
        # can happen if int is passed instead of a string or similar
        return None

Screen Shot 2022-08-03 at 5 00 28 PM

bkyryliuk avatar Aug 04 '22 15:08 bkyryliuk

Codecov Report

Merging #20974 (ab59aa4) into master (eb5369f) will decrease coverage by 0.03%. The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master   #20974      +/-   ##
==========================================
- Coverage   66.38%   66.34%   -0.04%     
==========================================
  Files        1767     1767              
  Lines       67232    67317      +85     
  Branches     7138     7138              
==========================================
+ Hits        44633    44664      +31     
- Misses      20773    20827      +54     
  Partials     1826     1826              
Flag Coverage Δ
hive 53.15% <16.66%> (-0.05%) :arrow_down:
mysql 80.92% <66.66%> (-0.11%) :arrow_down:
postgres 80.98% <66.66%> (-0.12%) :arrow_down:
presto 53.05% <16.66%> (-0.05%) :arrow_down:
python 81.46% <66.66%> (-0.12%) :arrow_down:
sqlite 79.58% <66.66%> (-0.11%) :arrow_down:
unit 50.49% <66.66%> (-0.04%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/common/query_context_processor.py 88.49% <0.00%> (-1.51%) :arrow_down:
superset/dao/base.py 95.40% <100.00%> (ø)
superset/explore/utils.py 100.00% <100.00%> (ø)
superset/explore/form_data/commands/utils.py 92.30% <0.00%> (-7.70%) :arrow_down:
superset/explore/form_data/api.py 89.24% <0.00%> (-6.46%) :arrow_down:
superset/views/datasource/utils.py 93.33% <0.00%> (-2.02%) :arrow_down:
superset/models/sql_lab.py 77.01% <0.00%> (-1.74%) :arrow_down:
superset/explore/permalink/api.py 91.37% <0.00%> (-1.73%) :arrow_down:
superset/models/helpers.py 39.40% <0.00%> (-1.38%) :arrow_down:
... and 8 more

:mega: Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

codecov[bot] avatar Aug 05 '22 17:08 codecov[bot]

I wonder if the slowness in your check permission query is because postgres did not automatically create indexes on the sqlatable_user table (ref).

superset_test=> \d sqlatable_user;
                             Table "public.sqlatable_user"
  Column  |  Type   | Collation | Nullable |                  Default                   
----------+---------+-----------+----------+--------------------------------------------
 id       | integer |           | not null | nextval('sqlatable_user_id_seq'::regclass)
 user_id  | integer |           |          | 
 table_id | integer |           |          | 
Indexes:
    "sqlatable_user_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "sqlatable_user_table_id_fkey" FOREIGN KEY (table_id) REFERENCES tables(id)
    "sqlatable_user_user_id_fkey" FOREIGN KEY (user_id) REFERENCES ab_user(id)

ktmud avatar Aug 08 '22 17:08 ktmud

I wonder if the slowness in your check permission query is because postgres did not automatically create indexes on the sqlatable_user table (ref).

superset_test=> \d sqlatable_user;
                             Table "public.sqlatable_user"
  Column  |  Type   | Collation | Nullable |                  Default                   
----------+---------+-----------+----------+--------------------------------------------
 id       | integer |           | not null | nextval('sqlatable_user_id_seq'::regclass)
 user_id  | integer |           |          | 
 table_id | integer |           |          | 
Indexes:
    "sqlatable_user_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "sqlatable_user_table_id_fkey" FOREIGN KEY (table_id) REFERENCES tables(id)
    "sqlatable_user_user_id_fkey" FOREIGN KEY (user_id) REFERENCES ab_user(id)

sqlatable_user

Yeah - we have both indexes: image image

new relic states that a lot of time is spent in python code, not sql execution. I have 2 more places I want to optimized for better perf: https://apache-superset.slack.com/archives/G013HAE6Y0K/p1659722615103529?thread_ts=1659395755.971919&cid=G013HAE6Y0K and will have PRs coming up soon

bkyryliuk avatar Aug 08 '22 20:08 bkyryliuk