These changes switch flagpole from using pydantic as the base classes to dataclasses from stdlib. During the pydantic 2 upgrade the time to parse features shot way up which was unexpected. We have also not been as impressed with feature flag match times and suspected that pydantic might be contributing overhead.

The changes of this pull request re-implement flagpole with basic python dataclasses. The new implementation has reduced time to build feature flags, which should help improve the overall performance of our feature flagging. Improvements were measured both with cProfile, and local mini-benchmarks. (scripts provided below)

cProfile results

The cprofile script builds a feature 1000 times and collects profiling data from those operations.

current master (pydantic)

         9004 function calls in 1.125 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.125    1.125 <string>:1(<module>)
     1000    1.123    0.001    1.124    0.001 __init__.py:117(from_feature_dictionary)
        1    0.001    0.001    1.125    1.125 flagpole-profile:21(main)
     2000    0.001    0.000    0.001    0.000 typing.py:2424(get_origin)
        1    0.000    0.000    1.125    1.125 {built-in method builtins.exec}
     6000    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

after (dataclasses)

         41004 function calls in 0.009 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.010    0.010 <string>:1(<module>)
     3000    0.000    0.000    0.000    0.000 <string>:2(__init__)
     1000    0.001    0.000    0.009    0.000 __init__.py:128(from_feature_dictionary)
     1000    0.001    0.000    0.008    0.000 __init__.py:134(<listcomp>)
     2000    0.002    0.000    0.004    0.000 conditions.py:193(_condition_from_dict)
     3000    0.002    0.000    0.007    0.000 conditions.py:217(from_dict)
     3000    0.000    0.000    0.004    0.000 conditions.py:219(<listcomp>)
     2000    0.000    0.000    0.000    0.000 enum.py:1093(__new__)
     2000    0.000    0.000    0.000    0.000 enum.py:1255(value)
     2000    0.000    0.000    0.000    0.000 enum.py:193(__get__)
     2000    0.000    0.000    0.001    0.000 enum.py:686(__call__)
        1    0.000    0.000    0.010    0.010 flagpole-profile:21(main)
        1    0.000    0.000    0.010    0.010 {built-in method builtins.exec}
     1000    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
    19000    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}

While significantly more functions were called the overall runtime is much better.

flagpole-profile script

#!/usr/bin/env python

from sentry.runner import configure
from flagpole import Feature as FlagpoleFeature
import cProfile

configure()

feature_names = (
    # Largest feature flag is 2688 bytes
    "organizations:user-feedback-ui",
    # 404 bytes is around median
    "organizations:performance-chart-interpolation",
    # 97 bytes is smallest feature.
    "organizations:new-weekly-report",
)

feature_config = {}


def main():
    feature_name = "organizations:user-feedback-ui"
    # feature_name = "organizations:performance-chart-interpolation"
    # feature_name = "organizations:new-weekly-report"
    for _ in range(1000):
        FlagpoleFeature.from_feature_dictionary(feature_name, feature_config[feature_name])


if __name__ == "__main__":
    from sentry import options

    for feature_name in feature_names:
        feature_config[feature_name] = options.get(f"feature.{feature_name}")

    cProfile.run('main()')

Micro benchmark

In the microbenchmark I looked at 3 feature flags (the max, approximate median and smallest features). Each feature would be parsed 10000 times and I looked at the min, mean and max duration for building a Feature object from dictionary data loaded from options seeded with the current feature flag inventory.

before (pydantic)

Results for organizations:user-feedback-ui

option load_time 0.0018911361694335938

build_time min 0.0004580021
build_time max 0.0011467934
build_time mean 0.0004902341

RSS memory usage 272400384

Results for organizations:performance-chart-interpolation

option load_time 0.0030400753021240234

build_time min 0.0000336170
build_time max 0.0001301765
build_time mean 0.0000367536

RSS memory usage 272400384

Results for organizations:new-weekly-report

option load_time 0.0022568702697753906

build_time min 0.0000057220
build_time max 0.0000231266
build_time mean 0.0000069423

RSS memory usage 272400384

after (dataclasses)

Results for organizations:user-feedback-ui

option load_time 0.0033750534057617188

build_time min 0.0000026226
build_time max 0.0000209808
build_time mean 0.0000032377

RSS memory usage 276054016

Results for organizations:performance-chart-interpolation

option load_time 0.0033571720123291016

build_time min 0.0000016689
build_time max 0.0000610352
build_time mean 0.0000028541

RSS memory usage 276054016

Results for organizations:new-weekly-report

option load_time 0.003008127212524414

build_time min 0.0000000000
build_time max 0.0000047684
build_time mean 0.0000007447

RSS memory usage 276070400

flagpole-timing script

#!/usr/bin/env python

from sentry.runner import configure
from flagpole import Feature as FlagpoleFeature
import gc
import statistics
import time
import psutil

configure()


def main():
    from sentry import options
    gc.disable()

    feature_names = (
        # Largest feature flag is 2688 bytes
        "organizations:user-feedback-ui",
        # 404 bytes is around median
        "organizations:performance-chart-interpolation",
        # 97 bytes is smallest feature.
        "organizations:new-weekly-report",
    )
    for feature_name in feature_names:
        load_start = time.time()
        option_val = options.get(f"feature.{feature_name}")
        load_end = time.time()

        build_durations = []
        for _ in range(0, 10000):
            build_start = time.time()
            FlagpoleFeature.from_feature_dictionary(feature_name, option_val)
            build_end = time.time()
            build_durations.append(build_end - build_start)

        load_time = load_end - load_start
        rss_mem = psutil.Process().memory_info().rss

        print("")
        print(f"Results for {feature_name}")
        print("")
        print(f"option load_time {load_time}")
        print("")
        print("build_time min", "{:.10f}".format(min(build_durations)))
        print("build_time max", "{:.10f}".format(max(build_durations)))
        print("build_time mean", "{:.10f}".format(statistics.mean(build_durations)))
        print("")
        print(f"RSS memory usage {rss_mem}")


if __name__ == "__main__":
    main()

Again we see significant improvements in runtime without any impact in memory usage.

What we lose?

While moving to dataclasses gives us some gains in performance it has a few drawbacks:

The dataclass implementation isn't as typesound as pydantic would be. Each and every property is not validated for type correctness.
Updating the jsonschema will be manual going forward. Pydantic has a great integration with jsonschema that allows you to generate jsonschema documents from objects. Dataclasses do not.
Drift between the schema and implementation could occur. Because the jsonschema and code are not connected they could drift in the future.

Aug 08 '24 19:08 markstory

🚨 Warning: This pull request contains Frontend and Backend changes!

It's discouraged to make changes to Sentry's Frontend and Backend in a single pull request. The Frontend and Backend are not atomically deployed. If the changes are interdependent of each other, they must be separated into two pull requests and be made forward or backwards compatible, such that the Backend or Frontend can be safely deployed independently.

Have questions? Please ask in the #discuss-dev-infra channel.

Aug 08 '24 19:08 github-actions[bot]

Test Failures Detected: Due to failing tests, we cannot provide coverage reports at this time.

:x: Failed Test Results:

Completed 21719 tests with 14 failed, 21504 passed and 201 skipped.

View the full list of failed tests

pytest

Class name: tests.flagpole.test_conditions.TestContainsConditions
Test name: test_does_contain Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:104: in test_does_contain
    condition = ContainsCondition(property="foo", value="bar")
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestContainsConditions
Test name: test_does_not_contain Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:120: in test_does_not_contain
    condition = ContainsCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestContainsConditions
Test name: test_invalid_property_provided Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:136: in test_invalid_property_provided
    condition = ContainsCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestContainsConditions
Test name: test_missing_context_property Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:149: in test_missing_context_property
    condition = ContainsCondition(property="foo", value="bar")
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestEqualsConditions
Test name: test_equality_type_mismatch_strings Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:193: in test_equality_type_mismatch_strings
    condition = EqualsCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestEqualsConditions
Test name: test_is_equal_case_insensitive Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:183: in test_is_equal_case_insensitive
    condition = EqualsCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestEqualsConditions
Test name: test_is_equal_string Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:165: in test_is_equal_string
    condition = EqualsCondition(property="foo", value=value)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestEqualsConditions
Test name: test_is_not_equals Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:175: in test_is_not_equals
    condition = EqualsCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestInConditions
Test name: test_invalid_property_value Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:75: in test_invalid_property_value
    condition = InCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestInConditions
Test name: test_is_in Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:33: in test_is_in
    condition = InCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestInConditions
Test name: test_is_in_case_insensitivity Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:65: in test_is_in_case_insensitivity
    condition = InCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestInConditions
Test name: test_is_in_numeric_string Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:50: in test_is_in_numeric_string
    condition = InCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestInConditions
Test name: test_is_not_in Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:57: in test_is_not_in
    condition = InCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Class name: tests.flagpole.test_conditions.TestInConditions
Test name: test_missing_context_property Flags:

backend

#x1B[1m#x1B[31mtests/flagpole/test_conditions.py#x1B[0m:91: in test_missing_context_property
    in_condition = InCondition(property="foo", value=values)
#x1B[1m#x1B[31mE   TypeError: ConditionBase.__init__() missing 1 required positional argument: 'operator'#x1B[0m

Aug 08 '24 19:08 codecov[bot]

feat(features) Use dataclasses for flagpole instead of pydantic

cProfile results

Micro benchmark

What we lose?

:x: Failed Test Results:

pytest