numexpr icon indicating copy to clipboard operation
numexpr copied to clipboard

Bench large_array_vs_numpy.py is raising a TypeError

Open FrancescAlted opened this issue 1 month ago • 5 comments

With latest numexpr:

> python bench/large_array_vs_numpy.py
<snip>
Exception in thread Thread-16 (benchmark_numexpr_re_evaluate):
Traceback (most recent call last):
  File "/Users/faltet/miniforge3/envs/blosc2/lib/python3.13/threading.py", line 1043, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/Users/faltet/miniforge3/envs/blosc2/lib/python3.13/threading.py", line 994, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/faltet/blosc/numexpr/bench/large_array_vs_numpy.py", line 96, in benchmark_numexpr_re_evaluate
    time_taken = timeit.timeit(
        lambda: ne.re_evaluate(
    ...<2 lines>...
        number=num_runs,
    )
  File "/Users/faltet/miniforge3/envs/blosc2/lib/python3.13/timeit.py", line 237, in timeit
    return Timer(stmt, setup, timer, globals).timeit(number)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/Users/faltet/miniforge3/envs/blosc2/lib/python3.13/timeit.py", line 180, in timeit
    timing = self.inner(it, self.timer)
  File "<timeit-src>", line 6, in inner
  File "/Users/faltet/blosc/numexpr/bench/large_array_vs_numpy.py", line 97, in <lambda>
    lambda: ne.re_evaluate(
            ~~~~~~~~~~~~~~^
        local_dict={"a": a[start:end], "b": b[start:end], "c": c[start:end]}
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ),
    ^
  File "/Users/faltet/blosc/numexpr/numexpr/necompiler.py", line 1051, in re_evaluate
    args = getArguments(argnames, local_dict, global_dict, _frame_depth=_frame_depth)
  File "/Users/faltet/blosc/numexpr/numexpr/necompiler.py", line 774, in getArguments
    for name in names:
                ^^^^^
TypeError: 'NoneType' object is not iterable
numexpr time (threaded with re_evaluate over 32 chunks with 2 threads): 2.748033 seconds
numexpr speedup: 5.48x

This used to work before. This benchmark was introduced in PR #496, so there should be something that we broke recently. @emmaai if you have some clue on what's going on, that would be great.

FrancescAlted avatar Dec 03 '25 07:12 FrancescAlted

I'll have a look in the next few days. I haven't got on 3.13 yet.

emmaai avatar Dec 03 '25 12:12 emmaai

Just more information: tested with python 3.10, checkedout to the commit @a99412e9 that PR #496 was merged:

~/Work/Workplace/Python/numexpr @a99412e9 *1 ?6                                                                      ne 13:04:56
❯ python bench/large_array_vs_numpy.py
Benchmarking Expression 1:
NumPy time (threaded over 32 chunks with 2 threads): 4.709605 seconds
Exception in thread Thread-4 (benchmark_numexpr_re_evaluate):
Traceback (most recent call last):
  File "/Users/liuteng/opt/anaconda3/envs/ne/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/Users/liuteng/opt/anaconda3/envs/ne/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/liuteng/Work/Workplace/Python/numexpr/bench/large_array_vs_numpy.py", line 94, in benchmark_numexpr_re_evaluate
    time_taken = timeit.timeit(
  File "/Users/liuteng/opt/anaconda3/envs/ne/lib/python3.10/timeit.py", line 234, in timeit
    return Timer(stmt, setup, timer, globals).timeit(number)
  File "/Users/liuteng/opt/anaconda3/envs/ne/lib/python3.10/timeit.py", line 178, in timeit
    timing = self.inner(it, self.timer)
  File "<timeit-src>", line 6, in inner
  File "/Users/liuteng/Work/Workplace/Python/numexpr/bench/large_array_vs_numpy.py", line 95, in <lambda>
    lambda: ne.re_evaluate(
  File "/Users/liuteng/Work/Workplace/Python/numexpr/numexpr/necompiler.py", line 1002, in re_evaluate
    args = getArguments(argnames, local_dict, global_dict, _frame_depth=_frame_depth)
  File "/Users/liuteng/Work/Workplace/Python/numexpr/numexpr/necompiler.py", line 760, in getArguments
    for name in names:
TypeError: 'NoneType' object is not iterable
numexpr time (threaded with re_evaluate over 32 chunks with 2 threads): 2.472811 seconds
numexpr speedup: 1.90x

27rabbitlt avatar Dec 04 '25 12:12 27rabbitlt

If we change this line in benchmark https://github.com/pydata/numexpr/blob/master/bench/large_array_vs_numpy.py#L81 from

if index == 0:

to

if index == 0 or index == 1:

problem seems to be solved.

Here index refers to the chunk index (we divide the large array into 32 chunks, and assign even index chunks to thread 0, odd index chunks to thread 1), and supposedly we want to evaluate the first chunk, then we can re_evaluate from then on. However since the cahce is thread local, the argument name cache is not shared between two threads (thread 0 and thread 1). So it seems that we need to evaluate chunk 0 for thread 0 and chunk 1 for thread 1.

27rabbitlt avatar Dec 05 '25 22:12 27rabbitlt

the error is introduced by this pr/commit https://github.com/pydata/numexpr/commit/33ee71b0c6f13224f3031cd8b42921c748ce9ede

with this block specifically

https://github.com/pydata/numexpr/blob/33ee71b0c6f13224f3031cd8b42921c748ce9ede/numexpr/necompiler.py#L776-L783

where it addressed the race condition by "localising everything", but it removed global shared cache of compiled expression. Basically it means no re_evaluate among threads. My change was operating on the assumption that the expression should be global shared and cached, only inputs i.e. local_dict should be localised within the threads, hence the context aware dict was introduced. I'm not sure about the context of localising everything without caching. It looks like a new assumption to me. I'm happy to change the bench mark code to get it work if people agree on this new assumption.

emmaai avatar Dec 08 '25 07:12 emmaai

Yes, that commit was added during the support for free-threaded python. So you can proceed with this new assumption.

FrancescAlted avatar Dec 12 '25 13:12 FrancescAlted