codecov-api icon indicating copy to clipboard operation
codecov-api copied to clipboard

perf: Fix TestResults SQL query to not take as long

Open joseph-sentry opened this issue 1 year ago • 2 comments

this commit replaces the previous django orm code with an invocation to do a raw sql query that performs much better for repos with a large amount of data.

one worry with this approach is the risk for sql injection

in this specific case, we're passing most of the dynamic user provided values used in the sql through the parameters which the django docs say are sanitized and should be used to protect against sql injection

for the user provided values that are not passed through the sql parameters they're checked to be in a strict set of values, so we shouldn't be substituting any unexpected strings into the query

joseph-sentry avatar Oct 10 '24 20:10 joseph-sentry

Codecov Report

Attention: Patch coverage is 95.90164% with 5 lines in your changes missing coverage. Please review.

Project coverage is 96.28%. Comparing base (8945e37) to head (9c16f21).

:white_check_mark: All tests successful. No failed tests found.

Files with missing lines Patch % Lines
utils/test_results.py 94.89% 5 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #880      +/-   ##
==========================================
- Coverage   96.31%   96.28%   -0.03%     
==========================================
  Files         823      823              
  Lines       19079    19130      +51     
==========================================
+ Hits        18376    18420      +44     
- Misses        703      710       +7     
Flag Coverage Δ
unit 92.64% <95.90%> (-0.02%) :arrow_down:
unit-latest-uploader 92.64% <95.90%> (-0.02%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Oct 10 '24 21:10 codecov[bot]

Codecov Report

Attention: Patch coverage is 94.57831% with 9 lines in your changes missing coverage. Please review.

:white_check_mark: All tests successful. No failed tests found.

Files with missing lines Patch % Lines
utils/test_results.py 93.57% 9 Missing :warning:

:loudspeaker: Thoughts on this report? Let us know!

https://docs.djangoproject.com/en/5.1/topics/db/sql/#passing-parameters-into-raw it looks like you can name your query parameters like %(repoid)s and then make your params list a dict instead where the key is the parameter name. that would probably make this query a lot easier to work with

this is a really good idea i will implement this

do you have a sample of the SQL that was generated with the Django ORM query? i am not a django-optimizing expert but dynamically generating a query like this is fragile enough that it's worth a second look to see if we can't improve the performance in place

the issue i ran into with Django was that doing arbitrary joins and using derived tables seems to be impossible, and i had been messing around with the SQL for a while trying to avoid using self joins and derived tables but I could not find a performant way to query this information without doing so. If we can come up with SQL that doesn't use derived tables and is performant then I'd be willing to try using Django again but until then I think it's impossible.

joseph-sentry avatar Oct 15 '24 17:10 joseph-sentry

:x: 1 Tests Failed:

Tests completed Failed Passed Skipped
2701 1 2700 6
View the top 1 failed tests by shortest run time
utils.tests.unit.test_cursor test_cursor
Stack Traces | 0.001s run time
def test_cursor():
&gt;       row = TestResultsRow(
            test_id="test",
            name="test",
            computed_name="test",
            updated_at=datetime.fromisoformat("2024-01-01T00:00:00Z"),
            commits_where_fail=1,
            failure_rate=0.5,
            avg_duration=100,
            last_duration=100,
            flake_rate=0.1,
            total_fail_count=1,
            total_flaky_fail_count=1,
            total_skip_count=1,
            total_pass_count=1,
        )
E       TypeError: TestResultsRow.__init__() got an unexpected keyword argument 'computed_name'

.../tests/unit/test_cursor.py:7: TypeError

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

codecov-qa[bot] avatar Oct 17 '24 17:10 codecov-qa[bot]

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

:x: Failed Test Results:

Completed 2707 tests with 1 failed, 2700 passed and 6 skipped.

View the full list of failed tests

pytest

  • Class name: utils.tests.unit.test_cursor
    Test name: test_cursor

    def test_cursor():
    > row = TestResultsRow(
    test_id="test",
    name="test",
    computed_name="test",
    updated_at=datetime.fromisoformat("2024-01-01T00:00:00Z"),
    commits_where_fail=1,
    failure_rate=0.5,
    avg_duration=100,
    last_duration=100,
    flake_rate=0.1,
    total_fail_count=1,
    total_flaky_fail_count=1,
    total_skip_count=1,
    total_pass_count=1,
    )
    E TypeError: TestResultsRow.__init__() got an unexpected keyword argument 'computed_name'

    .../tests/unit/test_cursor.py:7: TypeError

codecov-public-qa[bot] avatar Oct 17 '24 17:10 codecov-public-qa[bot]