beartype icon indicating copy to clipboard operation
beartype copied to clipboard

Optimize @callable_cached with zero-argument fast path

Open bitranox opened this issue 3 months ago • 4 comments

Probably my last PR for some time - need to work on smth else ..... excluded the unwanted LLM-CONTEXT directory and .claude instructions.

I tried to utilize the valuable comments of @glinte and @JWCS - I hope You like that better.

Implements the optimization proposed in FIXME comment (utilcachecall.py:14-19). The @callable_cached decorator now automatically detects zero-argument functions at decoration time and routes them to an optimized fast path.

Implementation:

  • Refactored callable_cached() to inspect function code objects
  • Added _callable_cached_zero_args() - optimized fast path
    • Simple dict with 'value'/'exception' keys (no argument handling)
    • Eliminates *args tuple creation and flattening overhead
    • Direct dict access instead of .get() method calls
  • Added _callable_cached_general() - original implementation for multi-arg functions
    • Extracted from callable_cached() with no functional changes
    • Maintains 100% backward compatibility

Performance (CPython 3.14.0):

  • Zero-arg functions: 49.2 ns/call (1.3x speedup)
  • Multi-arg functions: 156.7 ns/call (no regression)
  • Expected on GraalPy: 4.6x speedup for zero-arg functions

Testing:

  • All unit tests pass (393 passed, 26 skipped)
  • No breaking changes or API modifications
  • Transparent automatic optimization

Benefits:

  • Affects 37+ zero-argument @callable_cached functions in beartype
  • Particularly benefits is_python_pypy(), future is_python_graalpy(), platform detection
  • Zero runtime overhead (detection at decoration time only)

Also adds .claude/ to .gitignore for consistency with existing .gemini/ entry. it would be nice to keep LLM-CONTEXT/ in the .gitignore - there I would have a place to store it at least on my local branch. My personal preference would be to keep that contexts and analysis scripts in beartype-test/LLM-CONTEXT/<issue>/ .... and exclude beartype-test/LLM-CONTEXT from tests. But it looks like yall vote against that.


alternative LRU-Based Implementation

During the optimization work on @callable_cached, I developed an alternative implementation that extends Python's stdlib functools.lru_cache instead of using custom dictionary-based caching. This approach was fully implemented, tested, and benchmarked, but ultimately not deployed due to a small performance trade-off - and I learned that You dont like big changes. However, it would offer significant architectural advantages that may be valuable for future consideration.

What Was Built

A complete replacement for @callable_cached that:

  • Wraps functools.lru_cache as its foundation
  • Preserves all existing features (exception caching, unhashable handling, zero-arg optimization)
  • Adds new capabilities from LRU caching (memory management, introspection, cache clearing)
  • Maintains 100% backward compatibility with existing code

Benefits:

  • Memory safety (prevents unbounded growth)
  • Production observability (cache_info)
  • Test isolation (cache_clear)
  • Simpler codebase (-390 lines)
  • Future-proof (built on stdlib)
  • Standards-compliant API

Why would that be slightly slower? (around 2%)

  1. LRU bookkeeping overhead: Tracking access order for eviction
  2. Additional features: Thread-safety, statistics tracking
  3. Wrapper layers: Extra function call layer for extended functionality

greetings from lovely Vienna

Robert

bitranox avatar Nov 15 '25 14:11 bitranox

Codecov Report

:x: Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 94.50%. Comparing base (aa872c5) to head (ed4d5f2).

Files with missing lines Patch % Lines
beartype/_util/cache/utilcachecall.py 87.09% 4 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #590      +/-   ##
==========================================
- Coverage   94.53%   94.50%   -0.04%     
==========================================
  Files         307      307              
  Lines       10559    10572      +13     
==========================================
+ Hits         9982     9991       +9     
- Misses        577      581       +4     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Nov 15 '25 14:11 codecov[bot]

I like the size of this PR. The code changes seem good.

Glinte avatar Nov 16 '25 06:11 Glinte

Thanks for so much for the tremendous volunteerism, everybody. @beartype loves you guys! Okay. I admit. It probably seems like @beartype is ignoring you guys. It only looks like that, though. You can't believe what your eyes are telling you. :smiling_face_with_tear:

Seriously. Here's where I apologize. I'm sorry for neglecting both these amazing PRs, @bitranox, and your incisive reviews, @Glinte. The issue tracker continues blowing up about years-old issues I should've resolved years ago. Apparently, problems don't go away even when you ignore them. I ignored them so hard. Yet the problems festered. This is a teaching moment:

Don't do what @leycec did, which was nothing. Actually do something. Like, anything!

Now I'm on the QA hook (like a squirming worm) to resolve both #423 and #592 for an upcoming @beartype 0.22.6 release. Oh, Canadian Gods. How are we at the sixth patch release of this unceasing dev cycle already... It's time to sigh, face. :face_exhaling:

leycec avatar Nov 18 '25 05:11 leycec

Addendum: Let's see here...

seems like it is quite easy to add a guard for having no keyword args in the function to be cached, perhaps you want to add that?

Ah, hah! The age-old question. This is surprisingly non-trivial, interestingly. Boring reasons why include:

  • Circular imports. The beartype._util.cache.utilcachecall submodule is really early-time. It can't safely import from anything else in the @beartype codebase, because literally everything else in the @beartype codebase imports that submodule. That's more an annoyance than a hard blocker to doing non-trivial stuff in that submodule. Still, it's best not to get too magical there. In this case, the existing beartype._util.func.arg.utilfuncargtest submodule defines a number of testers that would be useful to call in the @callable_cached decorator to guard against keyword parameters (e.g., is_func_arg_variadic_keyword()). But we can't safely import that submodule in @callable_cached. So, we'd have to manually copy-paste that functionality over. Whatevahs! Definitely doable. Just... irksome. It irks some. Black-out drunken face ensues: :woozy_face:
  • False positives. How exactly do you "add a guard for having no keyword args in the function to be cached," anyway? Unless explicitly restricted in the decorated function signature (e.g., with positional- or keyword-only parameters), Python permits callers to flexibly pass any parameter either positionally or by keyword. The only use cases the @callable_cached can explicitly guard against are:
    • Keyword-only parameters. @callable_cached can detect that the decorated function accepts keyword-only parameters with weirdo one-liner unreadable boolean logic resembling bool(func.__code__.co_kwonlyargcount).
    • Variadic keyword parameters. @callable_cached can detect that the decorated function accepts a variadic keyword parameter with weirdo one-liner unreadable boolean logic resembling bool(func.__code__.co_flags & inspect.CO_VARKEYWORDS). Wierdo stuff, but it works. @beartype is one giant monolith of weirdo stuff. What's one more on the pile? :laughing:

Unfortunately, that's all @callable_cached can do. Actually, @callable_cached could go one step further. @callable_cached could explicitly force all decorated callables to accept only positional-only parameters. In a certain sense, this isn't hard. Just assert that func.__code__.co_posonlyargcount > 0 while everything else is 0. Okay. Great. In a certain other sense, though, this is hard. This requires us to refactor literally the entire codebase to rewrite memoized callable signatures to accept only positional-only parameters. That's... the hard part. It's miserable painstaking work. Misery, I invoke thee! :magic_wand: :hurtrealbad:

The gains are also pretty minimal. Why? Because Python itself already guards against passing keyword parameters to @callable_cached-decorated callables, albeit at a later time. When you intentionally define a decorator to only accept positional parameters like this...

def _callable_cached(*args): ...

...then Python itself raises exceptions like this when try to pass that callable a keyword parameter:

>>> def _callable_cached(*args): ...
>>> _callable_cached(super_nifty_kwarg='Do the impossible, Python. I dare you.')
Traceback (most recent call last):
  File "/home/leycec/tmp/mopy.py", line 4, in <module>
    _callable_cached(super_nifty_kwarg='Do the impossible, Python. I dare you.')
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: _callable_cached() got an unexpected keyword argument 'super_nifty_kwarg'

In other words, it's probably best just to let Python itself continue to do the heavy parsing lifting here. Whenever I see the above error, I inwardly groan as I realize what I've done. I commited an optimization sin. You can tell, because the exception message is unreadable and makes no sense. :rofl:

Thanks again for everything, everybody! Someday, let us build a utopian future of wonder and joy by merging this infinite heap of PRs. Who knew that utopia could be built just by merging PRs!? It's true, though. I'm pretty sure it's true.

leycec avatar Nov 18 '25 05:11 leycec