django-components [perf] Validation of args/kwargs seems to have a fairly large performance penalty (~35%) [v0.125]

The new changes for validation of args and kwargs order in v0.125 slows down rendering by about ~35% in some of my test cases (lots of very simple components).

I assume the reason for this feature is to get better feedback on what exact syntax error the user made, but the problem is that the whole use of eval() is fairly slow.

Wouldn't it work to just check for SyntaxError: positional argument follows keyword argument manually and raise that one manually, and TypeError: foo() got multiple values for argument 'whatever' could just be raised by calling the render function with *args and **kwargs?

If I'm not missing anything or at least not much, then I'd be happy to provide a small PR including some benchmark for both v0.124 and my PR.

Jan 30 '25 21:01 oliverhaas

We can do so manually. I went with eval because I wanted the input to {% component %} template tags to behave similar to python functions, but I didn't want to bother with trying to replicate each error case separately. But I didn't realize eval would be this slow.

I used eval especially so that the generated function checks for multiple same keywords. Because there, one cannot simply do

func(*args, **kwargs)

because any duplicate keys would already be overwritten when squashed into kwargs. So the generated function that's being evaled looks more like this:

kwarg1 = {"key": 1}
kwarg2 = {"key": 2}
func(*args, **kwarg1, **kwarg2)

And it's that dynamic number of spreads that required eval.

If we do so manually, then the cases to cover are at least:

too few arguments (whether positional or kwargs)
too many argument (whether positional or kwargs)
multiple keyword arguments
value given by both positional and kwarg (That's the got multiple values for argument 'whatever' error)
positional arguments after keyword arguments.

After that, we can pass the data around as *args and **kwargs.

Jan 31 '25 08:01 JuroOravec

Btw, that ~35% you're seeing, is that specifically due to validation, or generally eval() as a black box? When I did some profiling I also saw eval() high up the chart, (71% in my case).

eval() is also used when calling BaseNode.render() on line 197, for the same purpose - to pass the args and kwargs in the same order as they were defined on the {% component %} tag.

I think we can actually remove this. Since this is being called after validation, it should be fine to remove the eval() here and simply collect the inputs into args and kwargs, and pass those to the render function as *args **kwargs.

So I assumed that the 71% that I'm seeing for eval() is because because it hides away all the rendering of the nested components.

~validate_params, on the other hand, seemed to have smaller impact~.

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1259    0.004    0.000    0.139    0.000 /Users/presenter/repos/django-components/src/django_components/util/template_tag.py:21(validate_params)

The above is for a page with 2 MB of HTML and ~360 components with a total rendering time of 2.5s on the first load.

1s of the 2.5s are spent on the tag_parser - parsing the contents of {% component %}. This affects only the first load, and I already have a Rust parser written for it (just need to finish writing tests).

So we can assume that the total rendering time in my example is 1.5s.

If I'm reading that right, the cProfiler output says that validate_params is almost 10%. So I agree, that is a big chunk.

Jan 31 '25 09:01 JuroOravec

My test case was the simplest component I have, an icon component which basically does not have any logic, and rendered that on an otherwise completely empty page 1000 times. I don't think that is a very realistic case, but still, I'm just gonna try improving the situation and see if it leads to anything useful.

Thanks for your work on this package in general.

Jan 31 '25 18:01 oliverhaas

So I gave a go at replacing the eval at validation with manual one. This was a good task for LLM (small, well defined, and transforming already existing knowledge instead of creating new), so was fairly easy to do.

However, the perf barely changed:

     1256    0.007    0.000    0.136    0.000 /Users/presenter/repos/django-components/src/django_components/util/template_tag.py:20(validate_params)

Looking closer, it seems that half of that, 0.065s, is taken just by Python's introspection API (inspect.bind, inspect_apply, signature):

     1256    0.002    0.000    0.027    0.000 /usr/local/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/inspect.py:3190(bind)
     1290    0.010    0.000    0.012    0.000 /usr/local/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/inspect.py:2875(apply_defaults)
     1313    0.002    0.000    0.025    0.000 /usr/local/Cellar/[email protected]/3.11.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/inspect.py:3261(signature)

I think the validation is useful, and I also think that those 10-30% is a lot.

One way how this could be reduced is by running the validation only when needed.

E.g. when a template tag is inside a forloop:
```
 {% for i in list %}
     {% component "table" arg1 key=val %}
     {% endcomponent %}
 {% endfor %}
```
Then even if we call the component million times, it will always receive 1 positional arg (arg1) and one kwarg (key=val). So in this case it's sufficient if we run the validation only every once.

However, this optimization is applicable only if the template tag does NOT contain any spread. E.g. in following case:
```
 {% for i in list %}
     {% component "table" ...arg1 key=val %}
     {% endcomponent %}
 {% endfor %}
```
The ...arg1 may expand to maybe 1, maybe 2, maybe 50, or maybe 0 positional OR keyword arguments (all either pos or kw, but not both simulteneously). So if template tag accepts a spread, we have to check it.

~Unlesss... the template tag accepts *args and **kwarg~. Hm, no... Even if it had *args, **kwargs, we'd still need to check for duplicate keywords, or for args after kwargs (e.g. in case of multiple spreads like {% my_tag ...kwargs ...args %}).

Anyway, this is blocked until I'm done implementing the "tag parser" (mentioned here). As that will change the logic upstream of this (it will replace the tag_parser.py file).
Other thing I'm thinking is that, if the above won't help, then if we can get the function signature cheaply, we could off-load this to Rust.
- The signature.bind and apply_defaults methods are used in order to find if a function was given too little / too many, etc parameters.
  
  If I understand correctly, they "apply the defaults", so the same having a function like this:
```
def my_fn(a = 10, b = 20):
```
- BUT, for the validation's sake, we don't need the actual VALUES. It's suficient to know WHETHER a parameter has a default.
- So, in theory, if we could cheaply get a function's signature as a string, e.g. "my_fn(a = 10, b = 20)", then we could write a low-level function that would:
  1. Parse the function string to learn that the function accepts params a and b, and that both of them have defaults.
  2. It would receive a list of (Python) parameters, and convert it to a simpler (Rust) internal representation - we just need to know 1. if it's arg or kwarg, 2. key name (if kwarg), 3. Their position (as set by the list)
  3. It would compare the two, and raise appropriate TypeError.
  4. If the validator did NOT raise error, it would NOT return anything.
  5. So, on the Python side, if there was no error, we can pass the (Python-side) args and kwargs to the component function as fn(*args, **kwargs).
  - So this way there would still be pretty solid separation from the two. If, instead, we'd have to build and return Python objects from within the Rust impl, that'd feel messier.

Btw @oliverhaas thanks again for flagging this. Feel free to do more perf hunting. 😄

I reckon, if we decide to move to Rust not only the extremely slow parts like the HTML parser was, but also for this kind of "mid-sized" optimizations (10-30%), then there will probably be more functions that can be optimized. This would be similar to how I assume Pydantic does with their core library.

Feb 01 '25 23:02 JuroOravec

I might be missing something here, but can we do my_func.__code__.co_varnames?

$ python -m timeit -s "def my_func(a=4, b='hello'): ..." "from inspect import signature; signature(my_func)"
50000 loops, best of 5: 4.91 usec per loop
$ python -m timeit -s "def my_func(a=4, b='hello'): ..." "my_func.__code__.co_varnames"
10000000 loops, best of 5: 30.5 nsec per loop

Feb 02 '25 00:02 EmilStenstrom

@EmilStenstrom Didn't know about that!

I may be wrong, but I thought the co_xxx attributes are implementation specific (CPython), so I'd rahter avoid using that. Chatbots agree with me, tho after just a quick search I couldn't find that clearly stated in Python documentation / online if that's true.

Chatbots also suggest that, if not relying on co_xxx, then there doesn't seem to be any other way to get function signature (even if just as a string), other than using inspect. So in that case we have to use inspect.signature (or similar)

However, I also found that the defaults are actually stored under __defaults__. That could be used together with the function signature to detect which args / kwargs are required. Thus we could get rid of apply_defaults and bind, shaving possibly up to 39 ms of the total 136 ms.

Feb 02 '25 08:02 JuroOravec

Just quickly tested pypy, and they have the exact same API as cPython. Would of course be possible to try other implementations as well if we wanted. Not sure. We currently don't test other distributions in our test matrix, and reading this issue it seems the speed penalty is real here.

But if we want to be really sure to support other python distributions we could maybe check for co_varnames and fall back to inspect if it doesn't exist?

Below is a slightly more worked example. co_varnames contain all local variables, so you need to pick the first co_argcount ones.

>>> def get_params_with_defaults(func):
...     code = func.__code__
...     arg_count = code.co_argcount
...     # Get all parameter names
...     param_names = code.co_varnames[:arg_count]
...
...     # Get defaults (if any)
...     defaults = func.__defaults__ or ()
...     # Map the defaults to the corresponding parameter names.
...     # Defaults correspond to the last len(defaults) parameters.
...     default_mapping = dict(zip(param_names[-len(defaults):], defaults))
...
...     return param_names, default_mapping
...
>>> def my_func(a=4, b='hello'): c=1
...
>>> get_params_with_defaults(my_func)
(('a', 'b'), {'a': 4, 'b': 'hello'})

Feb 02 '25 09:02 EmilStenstrom

@EmilStenstrom Love that idea, I implemented it here https://github.com/django-components/django-components/pull/945.

In my project, this brings the param validation down to 20-22 ms. So compared to the initial ~136-139 ms. So that's about 6-7x reduction.

I think it makes sense to keep this ticket open until after I've upgraded that tag parser, so we do validation only once if the template tag has no spreads, as that could also save a lot of recomputation.

Feb 02 '25 20:02 JuroOravec

@JuroOravec awesome to hear that it worked! :)

Feb 03 '25 10:02 EmilStenstrom

@JuroOravec This is solved too right?

Feb 03 '25 22:02 EmilStenstrom

@EmilStenstrom Yup, but I still want to address this point before we close this one:

I think it makes sense to keep this ticket open until after I've upgraded that tag parser, so we do validation only once if the template tag has no spreads, as that could also save a lot of recomputation.

Feb 04 '25 09:02 JuroOravec

Just some numbers from fiddling around with this in the last days and just now (but not double-checked):

# 0.127
Basically empty component: 0.1647451100870967 ms per iteration
Slotted component: 0.6152174799935892 ms per iteration
Small component: 0.1817340349080041 ms per iteration

# 0.127 but manually removed validation
Basically empty component: 0.10124936408828944 ms per iteration
Slotted component: 0.26561844593379647 ms per iteration
Small component: 0.10970560903660953 ms per iteration

# 0.128
Basically empty component: 0.11050817200157326 ms per iteration
Slotted component: 0.30305335199227557 ms per iteration
Small component: 0.11696803499944508 ms per iteration

Slotted and small components were from the existing benchmarks in the repository, but without js and css files. Also basically empty component by me:

class BasicallyEmptyComponent(Component):
    template: types.django_html = """
        hello world
    """

    def get_context_data(self):
        return {}

So just to say: Good stuff @JuroOravec.

Feb 04 '25 16:02 oliverhaas

Btw, @oliverhaas, if that's OK, I'd reopen this issue to keep track of this 😄

I think it makes sense to keep this ticket open until after I've upgraded that tag parser, so we do validation only once if the template tag has no spreads, as that could also save a lot of recomputation.

Feb 14 '25 21:02 JuroOravec