Bench large_array_vs_numpy.py is raising a TypeError
With latest numexpr:
> python bench/large_array_vs_numpy.py
<snip>
Exception in thread Thread-16 (benchmark_numexpr_re_evaluate):
Traceback (most recent call last):
File "/Users/faltet/miniforge3/envs/blosc2/lib/python3.13/threading.py", line 1043, in _bootstrap_inner
self.run()
~~~~~~~~^^
File "/Users/faltet/miniforge3/envs/blosc2/lib/python3.13/threading.py", line 994, in run
self._target(*self._args, **self._kwargs)
~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/faltet/blosc/numexpr/bench/large_array_vs_numpy.py", line 96, in benchmark_numexpr_re_evaluate
time_taken = timeit.timeit(
lambda: ne.re_evaluate(
...<2 lines>...
number=num_runs,
)
File "/Users/faltet/miniforge3/envs/blosc2/lib/python3.13/timeit.py", line 237, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
File "/Users/faltet/miniforge3/envs/blosc2/lib/python3.13/timeit.py", line 180, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "/Users/faltet/blosc/numexpr/bench/large_array_vs_numpy.py", line 97, in <lambda>
lambda: ne.re_evaluate(
~~~~~~~~~~~~~~^
local_dict={"a": a[start:end], "b": b[start:end], "c": c[start:end]}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
),
^
File "/Users/faltet/blosc/numexpr/numexpr/necompiler.py", line 1051, in re_evaluate
args = getArguments(argnames, local_dict, global_dict, _frame_depth=_frame_depth)
File "/Users/faltet/blosc/numexpr/numexpr/necompiler.py", line 774, in getArguments
for name in names:
^^^^^
TypeError: 'NoneType' object is not iterable
numexpr time (threaded with re_evaluate over 32 chunks with 2 threads): 2.748033 seconds
numexpr speedup: 5.48x
This used to work before. This benchmark was introduced in PR #496, so there should be something that we broke recently. @emmaai if you have some clue on what's going on, that would be great.
I'll have a look in the next few days. I haven't got on 3.13 yet.
Just more information: tested with python 3.10, checkedout to the commit @a99412e9 that PR #496 was merged:
~/Work/Workplace/Python/numexpr @a99412e9 *1 ?6 ne 13:04:56
❯ python bench/large_array_vs_numpy.py
Benchmarking Expression 1:
NumPy time (threaded over 32 chunks with 2 threads): 4.709605 seconds
Exception in thread Thread-4 (benchmark_numexpr_re_evaluate):
Traceback (most recent call last):
File "/Users/liuteng/opt/anaconda3/envs/ne/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/Users/liuteng/opt/anaconda3/envs/ne/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/Users/liuteng/Work/Workplace/Python/numexpr/bench/large_array_vs_numpy.py", line 94, in benchmark_numexpr_re_evaluate
time_taken = timeit.timeit(
File "/Users/liuteng/opt/anaconda3/envs/ne/lib/python3.10/timeit.py", line 234, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/Users/liuteng/opt/anaconda3/envs/ne/lib/python3.10/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "/Users/liuteng/Work/Workplace/Python/numexpr/bench/large_array_vs_numpy.py", line 95, in <lambda>
lambda: ne.re_evaluate(
File "/Users/liuteng/Work/Workplace/Python/numexpr/numexpr/necompiler.py", line 1002, in re_evaluate
args = getArguments(argnames, local_dict, global_dict, _frame_depth=_frame_depth)
File "/Users/liuteng/Work/Workplace/Python/numexpr/numexpr/necompiler.py", line 760, in getArguments
for name in names:
TypeError: 'NoneType' object is not iterable
numexpr time (threaded with re_evaluate over 32 chunks with 2 threads): 2.472811 seconds
numexpr speedup: 1.90x
If we change this line in benchmark https://github.com/pydata/numexpr/blob/master/bench/large_array_vs_numpy.py#L81 from
if index == 0:
to
if index == 0 or index == 1:
problem seems to be solved.
Here index refers to the chunk index (we divide the large array into 32 chunks, and assign even index chunks to thread 0, odd index chunks to thread 1), and supposedly we want to evaluate the first chunk, then we can re_evaluate from then on. However since the cahce is thread local, the argument name cache is not shared between two threads (thread 0 and thread 1). So it seems that we need to evaluate chunk 0 for thread 0 and chunk 1 for thread 1.
the error is introduced by this pr/commit https://github.com/pydata/numexpr/commit/33ee71b0c6f13224f3031cd8b42921c748ce9ede
with this block specifically
https://github.com/pydata/numexpr/blob/33ee71b0c6f13224f3031cd8b42921c748ce9ede/numexpr/necompiler.py#L776-L783
where it addressed the race condition by "localising everything", but it removed global shared cache of compiled expression. Basically it means no re_evaluate among threads. My change was operating on the assumption that the expression should be global shared and cached, only inputs i.e. local_dict should be localised within the threads, hence the context aware dict was introduced. I'm not sure about the context of localising everything without caching. It looks like a new assumption to me. I'm happy to change the bench mark code to get it work if people agree on this new assumption.
Yes, that commit was added during the support for free-threaded python. So you can proceed with this new assumption.