BUG: elemwise mult of ndim>=2 arrays errors on mac for mode=FAST_RUN
Describe the issue:
Hi all, getting this error when trying to compile any functions which involve elementwise multiplication (edit: and any binary op it seems) of two arrays when 2 or more dimensions are involved. This is on mac-arm, pytensor=2.31.7 installed from pypi but in a conda env. The core clang error is:
error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list
It succeeds with vectors, on linux, and with other modes like "FAST_COMPILE".
Apologies if I am missing any context, and thanks for all the work on this fantastic project!
Reproducable code example:
import pytensor
import pytensor.tensor as pt
q = pt.tensor(dtype="float64", shape=(2, 3))
w = pt.tensor(dtype="float64", shape=(2, 3))
e = q * w
f = pytensor.function([q, w], [e], mode="FAST_RUN")
Error message:
---------------------------------------------------------------------------
CompileError Traceback (most recent call last)
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/vm.py:1230, in VMLinker.make_all(self, profiler, input_storage, output_storage, storage_map)
1226 # no-recycling is done at each VM.__call__ So there is
1227 # no need to cause duplicate c code by passing
1228 # no_recycling here.
1229 thunks.append(
-> 1230 node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
1231 )
1232 linker_make_thunk_time[node] = time.perf_counter() - thunk_start
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/op.py:125, in COp.make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
124 try:
--> 125 return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
126 except (NotImplementedError, MethodNotDefined):
127 # We requested the c code, so don't catch the error.
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/op.py:84, in COp.make_c_thunk(self, node, storage_map, compute_map, no_recycling)
83 raise NotImplementedError("float16")
---> 84 outputs = cl.make_thunk(
85 input_storage=node_input_storage, output_storage=node_output_storage
86 )
87 thunk, node_input_filters, node_output_filters = outputs
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/basic.py:1185, in CLinker.make_thunk(self, input_storage, output_storage, storage_map, cache, **kwargs)
1184 init_tasks, tasks = self.get_init_tasks()
-> 1185 cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
1186 input_storage, output_storage, storage_map, cache
1187 )
1189 res = _CThunk(cthunk, init_tasks, tasks, error_storage, module)
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/basic.py:1102, in CLinker.__compile__(self, input_storage, output_storage, storage_map, cache)
1101 output_storage = tuple(output_storage)
-> 1102 thunk, module = self.cthunk_factory(
1103 error_storage,
1104 input_storage,
1105 output_storage,
1106 storage_map,
1107 cache,
1108 )
1109 return (
1110 thunk,
1111 module,
(...) 1124 error_storage,
1125 )
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/basic.py:1626, in CLinker.cthunk_factory(self, error_storage, in_storage, out_storage, storage_map, cache)
1625 cache = get_module_cache()
-> 1626 module = cache.module_from_key(key=key, lnk=self)
1628 vars = self.inputs + self.outputs + self.orphans
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/cmodule.py:1251, in ModuleCache.module_from_key(self, key, lnk)
1250 location = dlimport_workdir(self.dirname)
-> 1251 module = lnk.compile_cmodule(location)
1252 name = module.__file__
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/basic.py:1527, in CLinker.compile_cmodule(self, location)
1526 _logger.debug(f"LOCATION {location}")
-> 1527 module = c_compiler.compile_str(
1528 module_name=mod.code_hash,
1529 src_code=src_code,
1530 location=location,
1531 include_dirs=self.header_dirs(),
1532 lib_dirs=self.lib_dirs(),
1533 libs=libs,
1534 preargs=preargs,
1535 )
1536 except Exception as e:
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/cmodule.py:2678, in GCC_compiler.compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, py_module, hide_symbols)
2674 # We replace '\n' by '. ' in the error message because when Python
2675 # prints the exception, having '\n' in the text makes it more
2676 # difficult to read.
2677 # compile_stderr = compile_stderr.replace("\n", ". ")
-> 2678 raise CompileError(
2679 f"Compilation failed (return status={status}):\n{' '.join(cmd)}\n{compile_stderr}"
2680 )
2681 elif config.cmodule__compilation_warning and compile_stderr:
2682 # Print errors just below the command line.
CompileError: Compilation failed (return status=1):
/usr/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -fPIC -undefined dynamic_lookup -ld64 -I/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/numpy/_core/include -I/Users/johnnie/miniforge3/envs/py312/include/python3.12 -I/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/c_code -L/Users/johnnie/miniforge3/envs/py312/lib -fvisibility=hidden -o /Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/m6b335b9adb43ae602698b6a3c8dda3171294f510564aaadccf2d53b79676b752.so /Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:594:9: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
594 | V5_stride0, V5_stride1,
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:594:9: note: insert an explicit cast to silence this issue
594 | V5_stride0, V5_stride1,
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:594:21: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
594 | V5_stride0, V5_stride1,
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:594:21: note: insert an explicit cast to silence this issue
594 | V5_stride0, V5_stride1,
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:595:1: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
595 | V3_stride0, V3_stride1,
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:595:1: note: insert an explicit cast to silence this issue
595 | V3_stride0, V3_stride1,
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:595:13: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
595 | V3_stride0, V3_stride1,
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:595:13: note: insert an explicit cast to silence this issue
595 | V3_stride0, V3_stride1,
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:596:1: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
596 | V1_stride0, V1_stride1
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:596:1: note: insert an explicit cast to silence this issue
596 | V1_stride0, V1_stride1
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:596:13: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
596 | V1_stride0, V1_stride1
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:596:13: note: insert an explicit cast to silence this issue
596 | V1_stride0, V1_stride1
| ^~~~~~~~~~
| static_cast<int>( )
6 errors generated.
During handling of the above exception, another exception occurred:
CompileError Traceback (most recent call last)
Cell In[16], line 7
5 w = pt.tensor(dtype="float64", shape=(2, 3))
6 e = q * w
----> 7 f = pytensor.function([q, w], [e], mode="FAST_RUN")
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/compile/function/__init__.py:332, in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, trust_input)
321 fn = orig_function(
322 inputs,
323 outputs,
(...) 327 trust_input=trust_input,
328 )
329 else:
330 # note: pfunc will also call orig_function -- orig_function is
331 # a choke point that all compilation must pass through
--> 332 fn = pfunc(
333 params=inputs,
334 outputs=outputs,
335 mode=mode,
336 updates=updates,
337 givens=givens,
338 no_default_updates=no_default_updates,
339 accept_inplace=accept_inplace,
340 name=name,
341 rebuild_strict=rebuild_strict,
342 allow_input_downcast=allow_input_downcast,
343 on_unused_input=on_unused_input,
344 profile=profile,
345 output_keys=output_keys,
346 trust_input=trust_input,
347 )
348 return fn
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/compile/function/pfunc.py:466, in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys, fgraph, trust_input)
452 profile = ProfileStats(message=profile)
454 inputs, cloned_outputs = construct_pfunc_ins_and_outs(
455 params,
456 outputs,
(...) 463 fgraph=fgraph,
464 )
--> 466 return orig_function(
467 inputs,
468 cloned_outputs,
469 mode,
470 accept_inplace=accept_inplace,
471 name=name,
472 profile=profile,
473 on_unused_input=on_unused_input,
474 output_keys=output_keys,
475 fgraph=fgraph,
476 trust_input=trust_input,
477 )
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/compile/function/types.py:1835, in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input, output_keys, fgraph, trust_input)
1822 m = Maker(
1823 inputs,
1824 outputs,
(...) 1832 trust_input=trust_input,
1833 )
1834 with config.change_flags(compute_test_value="off"):
-> 1835 fn = m.create(defaults)
1836 finally:
1837 if profile and fn:
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/compile/function/types.py:1719, in FunctionMaker.create(self, input_storage, storage_map)
1716 start_import_time = pytensor.link.c.cmodule.import_time
1718 with config.change_flags(traceback__limit=config.traceback__compile_limit):
-> 1719 _fn, _i, _o = self.linker.make_thunk(
1720 input_storage=input_storage_lists, storage_map=storage_map
1721 )
1723 end_linker = time.perf_counter()
1725 linker_time = end_linker - start_linker
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/basic.py:245, in LocalLinker.make_thunk(self, input_storage, output_storage, storage_map, **kwargs)
238 def make_thunk(
239 self,
240 input_storage: Optional["InputStorageType"] = None,
(...) 243 **kwargs,
244 ) -> tuple["BasicThunkType", "InputStorageType", "OutputStorageType"]:
--> 245 return self.make_all(
246 input_storage=input_storage,
247 output_storage=output_storage,
248 storage_map=storage_map,
249 )[:3]
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/vm.py:1239, in VMLinker.make_all(self, profiler, input_storage, output_storage, storage_map)
1237 thunks[-1].lazy = False
1238 except Exception:
-> 1239 raise_with_op(fgraph, node)
1241 t1 = time.perf_counter()
1243 if self.profile:
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/utils.py:526, in raise_with_op(fgraph, node, thunk, exc_info, storage_map)
521 warnings.warn(
522 f"{exc_type} error does not allow us to add an extra error message"
523 )
524 # Some exception need extra parameter in inputs. So forget the
525 # extra long error message in that case.
--> 526 raise exc_value.with_traceback(exc_trace)
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/vm.py:1230, in VMLinker.make_all(self, profiler, input_storage, output_storage, storage_map)
1225 thunk_start = time.perf_counter()
1226 # no-recycling is done at each VM.__call__ So there is
1227 # no need to cause duplicate c code by passing
1228 # no_recycling here.
1229 thunks.append(
-> 1230 node.op.make_thunk(node, storage_map, compute_map, [], impl=impl)
1231 )
1232 linker_make_thunk_time[node] = time.perf_counter() - thunk_start
1233 if not hasattr(thunks[-1], "lazy"):
1234 # We don't want all ops maker to think about lazy Ops.
1235 # So if they didn't specify that its lazy or not, it isn't.
1236 # If this member isn't present, it will crash later.
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/op.py:125, in COp.make_thunk(self, node, storage_map, compute_map, no_recycling, impl)
121 self.prepare_node(
122 node, storage_map=storage_map, compute_map=compute_map, impl="c"
123 )
124 try:
--> 125 return self.make_c_thunk(node, storage_map, compute_map, no_recycling)
126 except (NotImplementedError, MethodNotDefined):
127 # We requested the c code, so don't catch the error.
128 if impl == "c":
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/op.py:84, in COp.make_c_thunk(self, node, storage_map, compute_map, no_recycling)
82 warnings.warn(f"Disabling C code for {self} due to unsupported float16")
83 raise NotImplementedError("float16")
---> 84 outputs = cl.make_thunk(
85 input_storage=node_input_storage, output_storage=node_output_storage
86 )
87 thunk, node_input_filters, node_output_filters = outputs
89 if compute_map is None:
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/basic.py:1185, in CLinker.make_thunk(self, input_storage, output_storage, storage_map, cache, **kwargs)
1150 """Compile this linker's `self.fgraph` and return a function that performs the computations.
1151
1152 The return values can be used as follows:
(...) 1182
1183 """
1184 init_tasks, tasks = self.get_init_tasks()
-> 1185 cthunk, module, in_storage, out_storage, error_storage = self.__compile__(
1186 input_storage, output_storage, storage_map, cache
1187 )
1189 res = _CThunk(cthunk, init_tasks, tasks, error_storage, module)
1190 res.nodes = self.node_order
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/basic.py:1102, in CLinker.__compile__(self, input_storage, output_storage, storage_map, cache)
1100 input_storage = tuple(input_storage)
1101 output_storage = tuple(output_storage)
-> 1102 thunk, module = self.cthunk_factory(
1103 error_storage,
1104 input_storage,
1105 output_storage,
1106 storage_map,
1107 cache,
1108 )
1109 return (
1110 thunk,
1111 module,
(...) 1124 error_storage,
1125 )
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/basic.py:1626, in CLinker.cthunk_factory(self, error_storage, in_storage, out_storage, storage_map, cache)
1624 if cache is None:
1625 cache = get_module_cache()
-> 1626 module = cache.module_from_key(key=key, lnk=self)
1628 vars = self.inputs + self.outputs + self.orphans
1629 # List of indices that should be ignored when passing the arguments
1630 # (basically, everything that the previous call to uniq eliminated)
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/cmodule.py:1251, in ModuleCache.module_from_key(self, key, lnk)
1249 try:
1250 location = dlimport_workdir(self.dirname)
-> 1251 module = lnk.compile_cmodule(location)
1252 name = module.__file__
1253 assert name.startswith(location)
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/basic.py:1527, in CLinker.compile_cmodule(self, location)
1525 try:
1526 _logger.debug(f"LOCATION {location}")
-> 1527 module = c_compiler.compile_str(
1528 module_name=mod.code_hash,
1529 src_code=src_code,
1530 location=location,
1531 include_dirs=self.header_dirs(),
1532 lib_dirs=self.lib_dirs(),
1533 libs=libs,
1534 preargs=preargs,
1535 )
1536 except Exception as e:
1537 e.args += (str(self.fgraph),)
File ~/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/cmodule.py:2678, in GCC_compiler.compile_str(module_name, src_code, location, include_dirs, lib_dirs, libs, preargs, py_module, hide_symbols)
2670 print(
2671 "Check if package python-dev or python-devel is installed."
2672 )
2674 # We replace '\n' by '. ' in the error message because when Python
2675 # prints the exception, having '\n' in the text makes it more
2676 # difficult to read.
2677 # compile_stderr = compile_stderr.replace("\n", ". ")
-> 2678 raise CompileError(
2679 f"Compilation failed (return status={status}):\n{' '.join(cmd)}\n{compile_stderr}"
2680 )
2681 elif config.cmodule__compilation_warning and compile_stderr:
2682 # Print errors just below the command line.
2683 print(compile_stderr)
CompileError: Compilation failed (return status=1):
/usr/bin/clang++ -dynamiclib -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -fPIC -undefined dynamic_lookup -ld64 -I/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/numpy/_core/include -I/Users/johnnie/miniforge3/envs/py312/include/python3.12 -I/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/pytensor/link/c/c_code -L/Users/johnnie/miniforge3/envs/py312/lib -fvisibility=hidden -o /Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/m6b335b9adb43ae602698b6a3c8dda3171294f510564aaadccf2d53b79676b752.so /Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:594:9: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
594 | V5_stride0, V5_stride1,
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:594:9: note: insert an explicit cast to silence this issue
594 | V5_stride0, V5_stride1,
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:594:21: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
594 | V5_stride0, V5_stride1,
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:594:21: note: insert an explicit cast to silence this issue
594 | V5_stride0, V5_stride1,
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:595:1: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
595 | V3_stride0, V3_stride1,
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:595:1: note: insert an explicit cast to silence this issue
595 | V3_stride0, V3_stride1,
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:595:13: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
595 | V3_stride0, V3_stride1,
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:595:13: note: insert an explicit cast to silence this issue
595 | V3_stride0, V3_stride1,
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:596:1: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
596 | V1_stride0, V1_stride1
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:596:1: note: insert an explicit cast to silence this issue
596 | V1_stride0, V1_stride1
| ^~~~~~~~~~
| static_cast<int>( )
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:596:13: error: non-constant-expression cannot be narrowed from type 'ssize_t' (aka 'long') to 'int' in initializer list [-Wc++11-narrowing]
596 | V1_stride0, V1_stride1
| ^~~~~~~~~~
/Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64/tmp33u3kj1w/mod.cpp:596:13: note: insert an explicit cast to silence this issue
596 | V1_stride0, V1_stride1
| ^~~~~~~~~~
| static_cast<int>( )
6 errors generated.
Apply node that caused the error: Mul(<Matrix(float64, shape=(2, 3))>, <Matrix(float64, shape=(2, 3))>)
Toposort index: 0
Inputs types: [TensorType(float64, shape=(2, 3)), TensorType(float64, shape=(2, 3))]
Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
File "/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/ipykernel/zmqshell.py", line 549, in run_cell
return super().run_cell(*args, **kwargs)
File "/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3098, in run_cell
result = self._run_cell(
File "/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3153, in _run_cell
result = runner(coro)
File "/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/IPython/core/async_helpers.py", line 128, in _pseudo_sync_runner
coro.send(None)
File "/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3365, in run_cell_async
has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
File "/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3610, in run_ast_nodes
if await self.run_code(code, result, async_=asy):
File "/Users/johnnie/miniforge3/envs/py312/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3670, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/var/folders/7j/xck3s4z51278q3grnpy1gtpr0000gn/T/ipykernel_66037/2057199314.py", line 6, in <module>
e = q * w
HINT: Use a linker other than the C linker to print the inputs' shapes and strides.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.
PyTensor version information:
Note: float16 support is experimental, use at your own risk. Value: float64
warn_float64 ({'ignore', 'warn', 'pdb', 'raise'}) Doc: Do an action when a tensor variable with float64 dtype is created. Value: ignore
pickle_test_value (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103bfd880>>) Doc: Dump test values while pickling model. If True, test values will be dumped with model. Value: True
cast_policy ({'numpy+floatX', 'custom'}) Doc: Rules for implicit type casting Value: custom
device (cpu) Doc: Default device for computations. only cpu is supported for now Value: cpu
conv__assert_shape (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x102d46810>>) Doc: If True, AbstractConv* ops will verify that user-provided shapes match the runtime shapes (debugging option, may slow down compilation) Value: False
print_global_stats (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103a9f080>>) Doc: Print some global statistics (time spent) at the end Value: False
unpickle_function (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d2d760>>) Doc: Replace unpickled PyTensor functions with None. This is useful to unpickle old graphs that pickled them when it shouldn't Value: True
<pytensor.configparser.ConfigParam object at 0x103d2d700> Doc: Default compilation mode Value: Mode
cxx (<class 'str'>) Doc: The C++ compiler to use. Currently only g++ is supported, but supporting additional compilers should not be too difficult. If it is empty, no C++ code is compiled. Value: /usr/bin/clang++
linker ({'c', 'cvm_nogc', 'vm_nogc', 'vm', 'py', 'c|py', 'cvm', 'c|py_nogc'}) Doc: Default linker used if the pytensor flags mode is Mode Value: cvm
allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d92bd0>>) Doc: Do we default to delete intermediate results during PyTensor function calls? Doing so lowers the memory requirement, but asks that we reallocate memory at the next function call. This is implemented for the default linker, but may not work for all linkers. Value: True
optimizer ({'o1', 'None', 'o3', 'unsafe', 'o2', 'o4', 'fast_run', 'fast_compile', 'merge'}) Doc: Default optimizer. If not None, will use this optimizer with the Mode Value: o4
optimizer_verbose (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d92ea0>>) Doc: Print information about rewrites that are applied during a graph transformation. Value: False
optimizer_verbose_ignore (<class 'str'>)
Doc: Do not print information for rewrites with these names when optimizer_verbose is True. Separate names with ','
Value:
on_opt_error ({'ignore', 'warn', 'pdb', 'raise'}) Doc: What to do when an optimization crashes: warn and skip it, raise the exception, or fall into the pdb debugger. Value: warn
nocleanup (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d925d0>>) Doc: Suppress the deletion of code files that did not compile cleanly Value: False
on_unused_input ({'ignore', 'warn', 'raise'}) Doc: What to do if a variable in the 'inputs' list of pytensor.function() is not used in the graph. Value: raise
gcc__cxxflags (<class 'str'>) Doc: Extra compiler flags for gcc Value:
cmodule__warn_no_version (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1034c2db0>>) Doc: If True, will print a warning when compiling one or more Op with C code that can't be cached because there is no c_code_cache_version() function associated to at least one of those Ops. Value: False
cmodule__remove_gxx_opt (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d930b0>>) Doc: If True, will remove the -O* parameter passed to g++.This is useful to debug in gdb modules compiled by PyTensor.The parameter -g is passed by default to g++ Value: False
cmodule__compilation_warning (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1030e5ca0>>) Doc: If True, will print compilation warnings. Value: False
cmodule__preload_cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d91dc0>>) Doc: If set to True, will preload the C module cache at import time Value: False
cmodule__age_thresh_use (<class 'int'>) Doc: In seconds. The time after which PyTensor won't reuse a compile c module. Value: 2073600
cmodule__debug (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103beb5c0>>) Doc: If True, define a DEBUG macro (if not exists) for any compiled C code. Value: False
compile__wait (<class 'int'>) Doc: Time to wait before retrying to acquire the compile lock. Value: 5
compile__timeout (<class 'int'>) Doc: In seconds, time that a process will wait before deciding to override an existing lock. An override only happens when the existing lock is held by the same owner and has not been 'refreshed' by this owner for more than this period. Refreshes are done every half timeout period for running processes. Value: 120
tensor__cmp_sloppy (<class 'int'>) Doc: Relax pytensor.tensor.math._allclose (0) not at all, (1) a bit, (2) more Value: 0
lib__amdlibm (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d92ff0>>) Doc: Use amd's amdlibm numerical library Value: False
tensor__insert_inplace_optimizer_validate_nb (<class 'int'>) Doc: -1: auto, if graph have less then 500 nodes 1, else 10 Value: -1
traceback__limit (<class 'int'>) Doc: The number of stack to trace. -1 mean all. Value: 8
traceback__compile_limit (<class 'int'>) Doc: The number of stack to trace to keep during compilation. -1 mean all. If greater then 0, will also make us save PyTensor internal stack trace. Value: 0
warn__ignore_bug_before ({'0.8.2', '0.7', '1.0.2', '0.6', '1.0.4', '0.10', '0.9', '0.4', '0.8.1', '0.8', '1.0.1', '0.3', 'all', '0.5', '1.0.3', '0.4.1', '1.0', '1.0.5', 'None'}) Doc: If 'None', we warn about all PyTensor bugs found by default. If 'all', we don't warn about PyTensor bugs found by default. If a version, we print only the warnings relative to PyTensor bugs found after that version. Warning for specific bugs can be configured with specific [warn] flags. Value: 0.9
exception_verbosity ({'high', 'low'}) Doc: If 'low', the text of exceptions will generally refer to apply nodes with short names such as Elemwise{add_no_inplace}. If 'high', some exceptions will also refer to apply nodes with long descriptions like: A. Elemwise{add_no_inplace} B. log_likelihood_v_given_h C. log_likelihood_h Value: low
print_test_value (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93920>>)
Doc: If 'True', the eval of an PyTensor variable will return its test_value when this is available. This has the practical consequence that, e.g., in debugging my_var will print the same as my_var.tag.test_value when a test value is defined.
Value: False
compute_test_value ({'ignore', 'pdb', 'off', 'raise', 'warn'}) Doc: If 'True', PyTensor will run each op at graph build time, using Constants, SharedVariables and the tag 'test_value' as inputs to the function. This helps the user track down problems in the graph before it gets optimized. Value: off
compute_test_value_opt ({'ignore', 'pdb', 'off', 'raise', 'warn'}) Doc: For debugging PyTensor optimization only. Same as compute_test_value, but is used during PyTensor optimization Value: off
check_input (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1030c2fc0>>) Doc: Specify if types should check their input in their C code. It can be used to speed up compilation, reduce overhead (particularly for scalars) and reduce the number of generated C files. Value: True
NanGuardMode__nan_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d934d0>>) Doc: Default value for nan_is_error Value: True
NanGuardMode__inf_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93590>>) Doc: Default value for inf_is_error Value: True
NanGuardMode__big_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93500>>) Doc: Default value for big_is_error Value: True
NanGuardMode__action ({'warn', 'pdb', 'raise'}) Doc: What NanGuardMode does when it finds a problem Value: raise
DebugMode__patience (<class 'int'>) Doc: Optimize graph this many times to detect inconsistency Value: 10
DebugMode__check_c (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93560>>) Doc: Run C implementations where possible Value: True
DebugMode__check_py (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93650>>) Doc: Run Python implementations where possible Value: True
DebugMode__check_finite (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93680>>) Doc: True -> complain about NaN/Inf results Value: True
DebugMode__check_strides (<class 'int'>) Doc: Check that Python- and C-produced ndarrays have same strides. On difference: (0) - ignore, (1) warn, or (2) raise error Value: 0
DebugMode__warn_input_not_reused (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93470>>) Doc: Generate a warning when destroy_map or view_map says that an op works inplace, but the op did not reuse the input for its output. Value: True
DebugMode__check_preallocated_output (<class 'str'>) Doc: Test thunks with pre-allocated memory as output storage. This is a list of strings separated by ":". Valid values are: "initial" (initial storage in storage map, happens with Scan),"previous" (previously-returned memory), "c_contiguous", "f_contiguous", "strided" (positive and negative strides), "wrong_size" (larger and smaller dimensions), and "ALL" (all of the above). Value:
DebugMode__check_preallocated_output_ndim (<class 'int'>) Doc: When testing with "strided" preallocated output memory, test all combinations of strides over that number of (inner-most) dimensions. You may want to reduce that number to reduce memory or time usage, but it is advised to keep a minimum of 2. Value: 4
profiling__time_thunks (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d937a0>>) Doc: Time individual thunks when profiling Value: True
profiling__n_apply (<class 'int'>) Doc: Number of Apply instances to print by default Value: 20
profiling__n_ops (<class 'int'>) Doc: Number of Ops to print by default Value: 20
profiling__output_line_width (<class 'int'>) Doc: Max line width for the profiling output Value: 512
profiling__min_memory_size (<class 'int'>) Doc: For the memory profile, do not print Apply nodes if the size of their outputs (in bytes) is lower than this threshold Value: 1024
profiling__min_peak_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x102d22f30>>) Doc: The min peak memory usage of the order Value: False
profiling__destination (<class 'str'>) Doc: File destination of the profiling output Value: stderr
profiling__debugprint (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d939e0>>) Doc: Do a debugprint of the profiled functions Value: False
profiling__ignore_first_call (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93980>>) Doc: Do we ignore the first call of an PyTensor function. Value: False
on_shape_error ({'warn', 'raise'}) Doc: warn: print a warning and use the default value. raise: raise an error Value: warn
openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93860>>) Doc: Allow (or not) parallel computation on the CPU with OpenMP. This is the default value used when creating an Op that supports OpenMP parallelization. It is preferable to define it via the PyTensor configuration file ~/.pytensorrc or with the environment variable PYTENSOR_FLAGS. Parallelization is only done for some operations that implement it, and even for operations that implement parallelism, each operation is free to respect this flag or not. You can control the number of threads used with the environment variable OMP_NUM_THREADS. If it is set to 1, we disable openmp in PyTensor by default. Value: False
openmp_elemwise_minsize (<class 'int'>) Doc: If OpenMP is enabled, this is the minimum size of vectors for which the openmp parallelization is enabled in element wise ops. Value: 200000
optimizer_excluding (<class 'str'>) Doc: When using the default mode, we will remove optimizer with these tags. Separate tags with ':'. Value:
optimizer_including (<class 'str'>) Doc: When using the default mode, we will add optimizer with these tags. Separate tags with ':'. Value:
optimizer_requiring (<class 'str'>) Doc: When using the default mode, we will require optimizer with these tags. Separate tags with ':'. Value:
optdb__position_cutoff (<class 'float'>) Doc: Where to stop earlier during optimization. It represent the position of the optimizer where to stop. Value: inf
optdb__max_use_ratio (<class 'float'>) Doc: A ratio that prevent infinite loop in EquilibriumGraphRewriter. Value: 8.0
cycle_detection ({'regular', 'fast'}) Doc: If cycle_detection is set to regular, most inplaces are allowed,but it is slower. If cycle_detection is set to faster, less inplacesare allowed, but it makes the compilation faster.The interaction of which one give the lower peak memory usage iscomplicated and not predictable, so if you are close to the peakmemory usage, triyng both could give you a small gain. Value: regular
check_stack_trace ({'warn', 'log', 'raise', 'off'}) Doc: A flag for checking the stack trace during the optimization process. default (off): does not check the stack trace of any optimization log: inserts a dummy stack trace that identifies the optimizationthat inserted the variable that had an empty stack trace.warn: prints a warning if a stack trace is missing and also a dummystack trace is inserted that indicates which optimization insertedthe variable that had an empty stack trace.raise: raises an exception if a stack trace is missing Value: off
metaopt__verbose (<class 'int'>) Doc: 0 for silent, 1 for only warnings, 2 for full output withtimings and selected implementation Value: 0
unittests__rseed (<class 'str'>) Doc: Seed to use for randomized unit tests. Special value 'random' means using a seed of None. Value: 666
warn__round (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93b30>>)
Doc: Warn when using tensor.round with the default mode. Round changed its default from half_away_from_zero to half_to_even to have the same default as NumPy.
Value: False
profile (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93b60>>) Doc: If VM should collect profile information Value: False
profile_optimizer (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93b90>>) Doc: If VM should collect optimizer profile information Value: False
profile_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93bc0>>) Doc: If VM should collect memory profile information and print it Value: False
<pytensor.configparser.ConfigParam object at 0x103d93bf0> Doc: Useful only for the VM Linkers. When lazy is None, auto detect if lazy evaluation is needed and use the appropriate version. If the C loop isn't being used and lazy is True, use the Stack VM; otherwise, use the Loop VM. Value: None
numba__vectorize_target ({'cuda', 'parallel', 'cpu'}) Doc: Default target for numba.vectorize. Value: cpu
numba__fastmath (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93c50>>) Doc: If True, use Numba's fastmath mode. Value: True
numba__cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x103d93c80>>) Doc: If True, use Numba's file based caching. Value: True
compiledir_format (<class 'str'>) Doc: Format string for platform-dependent compiled module subdirectory (relative to base_compiledir). Available keys: device, gxx_version, hostname, numpy_version, platform, processor, pytensor_version, python_bitwidth, python_int_bitwidth, python_version, short_platform. Defaults to compiledir_%(short_platform)s-%(processor)s- %(python_version)s-%(python_bitwidth)s. Value: compiledir_%(short_platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s
<pytensor.configparser.ConfigParam object at 0x103d93e30> Doc: platform-independent root directory for compiled modules Value: /Users/johnnie/.pytensor
<pytensor.configparser.ConfigParam object at 0x103d93dd0> Doc: platform-dependent cache directory for compiled modules Value: /Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.12.10-64
blas__ldflags (<class 'str'>) Doc: lib[s] to include for [Fortran] level-3 blas implementation Value: -framework Accelerate -Wl,-rpath,/Users/johnnie/miniforge3/envs/py312/lib
blas__check_openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x1040ca9c0>>) Doc: Check for openmp library conflict. WARNING: Setting this to False leaves you open to wrong results in blas-related operations. Value: True
scan__allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x119882900>>) Doc: Allow/disallow gc inside of Scan (default: False) Value: False
scan__allow_output_prealloc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x11971e810>>) Doc: Allow/disallow memory preallocation for outputs inside of scan (default: True) Value: True
Context for the issue:
No response
Can you install from conda-forge instead of pip?. It should take care of linking to the right compiler/settings, which is something pip can't do
Yes I tried in some fresh environments with a few different versions of python (3.11 - 3.13) and pytensor (2.20 - 2.31) and the error does seem to stay the same.
(Example config output from conda forge install if helpful:)
/Users/johnnie/miniforge3/envs/pytensor/lib/python3.11/site-packages/pytensor/link/c/cmodule.py:2968: UserWarning: PyTensor could not link to a BLAS installation. Operations that might benefit from BLAS will be severely degraded.
This usually happens when PyTensor is installed via pip. We recommend it be installed via conda/mamba/pixi instead.
Alternatively, you can use an experimental backend such as Numba or JAX that perform their own BLAS optimizations, by setting `pytensor.config.mode == 'NUMBA'` or passing `mode='NUMBA'` when compiling a PyTensor function.
For more options and details see https://pytensor.readthedocs.io/en/latest/troubleshooting.html#how-do-i-configure-test-my-blas-library
warnings.warn(
floatX ({'float64', 'float16', 'float32'})
Doc: Default floating-point precision for python casts.
Note: float16 support is experimental, use at your own risk.
Value: float64
warn_float64 ({'pdb', 'warn', 'ignore', 'raise'})
Doc: Do an action when a tensor variable with float64 dtype is created.
Value: ignore
pickle_test_value (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105b79790>>)
Doc: Dump test values while pickling model. If True, test values will be dumped with model.
Value: True
cast_policy ({'numpy+floatX', 'custom'})
Doc: Rules for implicit type casting
Value: custom
device (cpu)
Doc: Default device for computations. only cpu is supported for now
Value: cpu
conv__assert_shape (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105aaaad0>>)
Doc: If True, AbstractConv* ops will verify that user-provided shapes match the runtime shapes (debugging option, may slow down compilation)
Value: False
print_global_stats (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c8be50>>)
Doc: Print some global statistics (time spent) at the end
Value: False
unpickle_function (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c94050>>)
Doc: Replace unpickled PyTensor functions with None. This is useful to unpickle old graphs that pickled them when it shouldn't
Value: True
<pytensor.configparser.ConfigParam object at 0x104c60f10>
Doc: Default compilation mode
Value: Mode
cxx (<class 'str'>)
Doc: The C++ compiler to use. Currently only g++ is supported, but supporting additional compilers should not be too difficult. If it is empty, no C++ code is compiled.
Value: /Users/johnnie/miniforge3/envs/pytensor/bin/clang++
linker ({'vm', 'vm_nogc', 'cvm_nogc', 'c|py_nogc', 'cvm', 'py', 'c|py', 'c'})
Doc: Default linker used if the pytensor flags mode is Mode
Value: cvm
allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x104c2b050>>)
Doc: Do we default to delete intermediate results during PyTensor function calls? Doing so lowers the memory requirement, but asks that we reallocate memory at the next function call. This is implemented for the default linker, but may not work for all linkers.
Value: True
optimizer ({'fast_run', 'fast_compile', 'o4', 'unsafe', 'o1', 'merge', 'None', 'o2', 'o3'})
Doc: Default optimizer. If not None, will use this optimizer with the Mode
Value: o4
optimizer_verbose (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105b99810>>)
Doc: Print information about rewrites that are applied during a graph transformation.
Value: False
optimizer_verbose_ignore (<class 'str'>)
Doc: Do not print information for rewrites with these names when `optimizer_verbose` is `True`. Separate names with ','
Value:
on_opt_error ({'pdb', 'warn', 'ignore', 'raise'})
Doc: What to do when an optimization crashes: warn and skip it, raise the exception, or fall into the pdb debugger.
Value: warn
nocleanup (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105995a50>>)
Doc: Suppress the deletion of code files that did not compile cleanly
Value: False
on_unused_input ({'warn', 'ignore', 'raise'})
Doc: What to do if a variable in the 'inputs' list of pytensor.function() is not used in the graph.
Value: raise
gcc__cxxflags (<class 'str'>)
Doc: Extra compiler flags for gcc
Value:
cmodule__warn_no_version (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c95750>>)
Doc: If True, will print a warning when compiling one or more Op with C code that can't be cached because there is no c_code_cache_version() function associated to at least one of those Ops.
Value: False
cmodule__remove_gxx_opt (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105b9a950>>)
Doc: If True, will remove the -O* parameter passed to g++.This is useful to debug in gdb modules compiled by PyTensor.The parameter -g is passed by default to g++
Value: False
cmodule__compilation_warning (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c953d0>>)
Doc: If True, will print compilation warnings.
Value: False
cmodule__preload_cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c954d0>>)
Doc: If set to True, will preload the C module cache at import time
Value: False
cmodule__age_thresh_use (<class 'int'>) Doc: In seconds. The time after which PyTensor won't reuse a compile c module. Value: 2073600
cmodule__debug (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c94350>>)
Doc: If True, define a DEBUG macro (if not exists) for any compiled C code.
Value: False
compile__wait (<class 'int'>)
Doc: Time to wait before retrying to acquire the compile lock.
Value: 5
compile__timeout (<class 'int'>)
Doc: In seconds, time that a process will wait before deciding to
override an existing lock. An override only happens when the existing
lock is held by the same owner *and* has not been 'refreshed' by this
owner for more than this period. Refreshes are done every half timeout
period for running processes.
Value: 120
tensor__cmp_sloppy (<class 'int'>)
Doc: Relax pytensor.tensor.math._allclose (0) not at all, (1) a bit, (2) more Value: 0
lib__amdlibm (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x104c60550>>)
Doc: Use amd's amdlibm numerical library
Value: False
tensor__insert_inplace_optimizer_validate_nb (<class 'int'>)
Doc: -1: auto, if graph have less then 500 nodes 1, else 10 Value: -1
traceback__limit (<class 'int'>)
Doc: The number of stack to trace. -1 mean all.
Value: 8
traceback__compile_limit (<class 'int'>)
Doc: The number of stack to trace to keep during compilation. -1 mean all. If greater then 0, will also make us save PyTensor internal stack trace. Value: 0
warn__ignore_bug_before ({'0.7', 'all', '1.0.2', '0.4', '0.6', 'None', '1.0.3', '0.3', '1.0.1', '1.0.4', '0.8.2', '0.4.1', '0.10', '1.0', '1.0.5', '0.5', '0.9', '0.8.1', '0.8'})
Doc: If 'None', we warn about all PyTensor bugs found by default. If 'all', we don't warn about PyTensor bugs found by default. If a version, we print only the warnings relative to PyTensor bugs found after that version. Warning for specific bugs can be configured with specific [warn] flags.
Value: 0.9
exception_verbosity ({'high', 'low'}) Doc: If 'low', the text of exceptions will generally refer to apply nodes with short names such as Elemwise{add_no_inplace}. If 'high', some exceptions will also refer to apply nodes with long descriptions like:
A. Elemwise{add_no_inplace}
B. log_likelihood_v_given_h
C. log_likelihood_h
Value: low
print_test_value (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c95550>>)
Doc: If 'True', the __eval__ of an PyTensor variable will return its test_value when this is available. This has the practical consequence that, e.g., in debugging `my_var` will print the same as `my_var.tag.test_value` when a test value is defined.
Value: False
compute_test_value ({'pdb', 'ignore', 'raise', 'warn', 'off'})
Doc: If 'True', PyTensor will run each op at graph build time, using Constants, SharedVariables and the tag 'test_value' as inputs to the function. This helps the user track down problems in the graph before it gets optimized.
Value: off
compute_test_value_opt ({'pdb', 'ignore', 'raise', 'warn', 'off'})
Doc: For debugging PyTensor optimization only. Same as compute_test_value, but is used during PyTensor optimization
Value: off
check_input (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x104c48450>>)
Doc: Specify if types should check their input in their C code. It can be used to speed up compilation, reduce overhead (particularly for scalars) and reduce the number of generated C files.
Value: True
NanGuardMode__nan_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c95910>>)
Doc: Default value for nan_is_error
Value: True
NanGuardMode__inf_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105be0510>>)
Doc: Default value for inf_is_error
Value: True
NanGuardMode__big_is_error (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105915950>>)
Doc: Default value for big_is_error
Value: True
NanGuardMode__action ({'pdb', 'warn', 'raise'})
Doc: What NanGuardMode does when it finds a problem
Value: raise
DebugMode__patience (<class 'int'>)
Doc: Optimize graph this many times to detect inconsistency
Value: 10
DebugMode__check_c (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105b9aa10>>)
Doc: Run C implementations where possible
Value: True
DebugMode__check_py (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c8bd90>>)
Doc: Run Python implementations where possible
Value: True
DebugMode__check_finite (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x104ff5a10>>)
Doc: True -> complain about NaN/Inf results
Value: True
DebugMode__check_strides (<class 'int'>)
Doc: Check that Python- and C-produced ndarrays have same strides. On difference: (0) - ignore, (1) warn, or (2) raise error
Value: 0
DebugMode__warn_input_not_reused (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x10595d950>>)
Doc: Generate a warning when destroy_map or view_map says that an op works inplace, but the op did not reuse the input for its output.
Value: True
DebugMode__check_preallocated_output (<class 'str'>)
Doc: Test thunks with pre-allocated memory as output storage. This is a list of strings separated by ":". Valid values are: "initial" (initial storage in storage map, happens with Scan),"previous" (previously-returned memory), "c_contiguous", "f_contiguous", "strided" (positive and negative strides), "wrong_size" (larger and smaller dimensions), and "ALL" (all of the above).
Value:
DebugMode__check_preallocated_output_ndim (<class 'int'>)
Doc: When testing with "strided" preallocated output memory, test all combinations of strides over that number of (inner-most) dimensions. You may want to reduce that number to reduce memory or time usage, but it is advised to keep a minimum of 2.
Value: 4
profiling__time_thunks (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c95d10>>)
Doc: Time individual thunks when profiling
Value: True
profiling__n_apply (<class 'int'>)
Doc: Number of Apply instances to print by default
Value: 20
profiling__n_ops (<class 'int'>)
Doc: Number of Ops to print by default
Value: 20
profiling__output_line_width (<class 'int'>)
Doc: Max line width for the profiling output
Value: 512
profiling__min_memory_size (<class 'int'>)
Doc: For the memory profile, do not print Apply nodes if the size
of their outputs (in bytes) is lower than this threshold
Value: 1024
profiling__min_peak_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105928590>>)
Doc: The min peak memory usage of the order
Value: False
profiling__destination (<class 'str'>)
Doc: File destination of the profiling output
Value: stderr
profiling__debugprint (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105be0d50>>)
Doc: Do a debugprint of the profiled functions
Value: False
profiling__ignore_first_call (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c95dd0>>)
Doc: Do we ignore the first call of an PyTensor function.
Value: False
on_shape_error ({'warn', 'raise'})
Doc: warn: print a warning and use the default value. raise: raise an error
Value: warn
openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c8bd10>>)
Doc: Allow (or not) parallel computation on the CPU with OpenMP. This is the default value used when creating an Op that supports OpenMP parallelization. It is preferable to define it via the PyTensor configuration file ~/.pytensorrc or with the environment variable PYTENSOR_FLAGS. Parallelization is only done for some operations that implement it, and even for operations that implement parallelism, each operation is free to respect this flag or not. You can control the number of threads used with the environment variable OMP_NUM_THREADS. If it is set to 1, we disable openmp in PyTensor by default.
Value: False
openmp_elemwise_minsize (<class 'int'>)
Doc: If OpenMP is enabled, this is the minimum size of vectors for which the openmp parallelization is enabled in element wise ops.
Value: 200000
optimizer_excluding (<class 'str'>)
Doc: When using the default mode, we will remove optimizer with these tags. Separate tags with ':'.
Value:
optimizer_including (<class 'str'>)
Doc: When using the default mode, we will add optimizer with these tags. Separate tags with ':'.
Value:
optimizer_requiring (<class 'str'>)
Doc: When using the default mode, we will require optimizer with these tags. Separate tags with ':'.
Value:
optdb__position_cutoff (<class 'float'>)
Doc: Where to stop earlier during optimization. It represent the position of the optimizer where to stop.
Value: inf
optdb__max_use_ratio (<class 'float'>)
Doc: A ratio that prevent infinite loop in EquilibriumGraphRewriter.
Value: 8.0
cycle_detection ({'regular', 'fast'})
Doc: If cycle_detection is set to regular, most inplaces are allowed,but it is slower. If cycle_detection is set to faster, less inplacesare allowed, but it makes the compilation faster.The interaction of which one give the lower peak memory usage iscomplicated and not predictable, so if you are close to the peakmemory usage, triyng both could give you a small gain.
Value: regular
check_stack_trace ({'raise', 'warn', 'off', 'log'})
Doc: A flag for checking the stack trace during the optimization process. default (off): does not check the stack trace of any optimization log: inserts a dummy stack trace that identifies the optimizationthat inserted the variable that had an empty stack trace.warn: prints a warning if a stack trace is missing and also a dummystack trace is inserted that indicates which optimization insertedthe variable that had an empty stack trace.raise: raises an exception if a stack trace is missing
Value: off
metaopt__verbose (<class 'int'>)
Doc: 0 for silent, 1 for only warnings, 2 for full output withtimings and selected implementation
Value: 0
unittests__rseed (<class 'str'>)
Doc: Seed to use for randomized unit tests. Special value 'random' means using a seed of None.
Value: 666
warn__round (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c961d0>>)
Doc: Warn when using `tensor.round` with the default mode. Round changed its default from `half_away_from_zero` to `half_to_even` to have the same default as NumPy.
Value: False
profile (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105adadd0>>)
Doc: If VM should collect profile information
Value: False
profile_optimizer (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c96190>>)
Doc: If VM should collect optimizer profile information
Value: False
profile_memory (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105b7a910>>)
Doc: If VM should collect memory profile information and print it
Value: False
<pytensor.configparser.ConfigParam object at 0x105c8bc90>
Doc: Useful only for the VM Linkers. When lazy is None, auto detect if lazy evaluation is needed and use the appropriate version. If the C loop isn't being used and lazy is True, use the Stack VM; otherwise, use the Loop VM.
Value: None
numba__vectorize_target ({'cuda', 'parallel', 'cpu'})
Doc: Default target for numba.vectorize.
Value: cpu
numba__fastmath (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c96390>>)
Doc: If True, use Numba's fastmath mode.
Value: True
numba__cache (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105c96350>>)
Doc: If True, use Numba's file based caching.
Value: True
compiledir_format (<class 'str'>)
Doc: Format string for platform-dependent compiled module subdirectory
(relative to base_compiledir). Available keys: device, gxx_version,
hostname, numpy_version, platform, processor, pytensor_version,
python_bitwidth, python_int_bitwidth, python_version, short_platform.
Defaults to compiledir_%(short_platform)s-%(processor)s-
%(python_version)s-%(python_bitwidth)s.
Value: compiledir_%(short_platform)s-%(processor)s-%(python_version)s-%(python_bitwidth)s
<pytensor.configparser.ConfigParam object at 0x1059163d0>
Doc: platform-independent root directory for compiled modules
Value: /Users/johnnie/.pytensor
<pytensor.configparser.ConfigParam object at 0x105928a90>
Doc: platform-dependent cache directory for compiled modules
Value: /Users/johnnie/.pytensor/compiledir_macOS-15.5-arm64-arm-64bit-arm-3.11.13-64
blas__ldflags (<class 'str'>)
Doc: lib[s] to include for [Fortran] level-3 blas implementation
Value:
blas__check_openmp (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x10601e510>>)
Doc: Check for openmp library conflict.
WARNING: Setting this to False leaves you open to wrong results in blas-related operations.
Value: True
scan__allow_gc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x105b7a350>>)
Doc: Allow/disallow gc inside of Scan (default: False)
Value: False
scan__allow_output_prealloc (<bound method BoolParam._apply of <pytensor.configparser.BoolParam object at 0x122a69850>>)
Doc: Allow/disallow memory preallocation for outputs inside of scan (default: True)
Value: True
I thought this was sorted already. One of these 3 guys might have an idea.
I summon @jessegrabowski @maresb @lucianopaz
Can you try adding the following to your .pytensorrc (it should be in your home directory):
[gcc]
cxxflags = -Wno-c++11-narrowing
I do this on my mac, but I don't have any insights into why it is necessary.
We should come up with a plan on how to troubleshoot this at some point. I wonder if Google Cloud can provision Apple Silicon machines, then I'd have a handle to get started.
I think GitHub macos are already not Intel
Thanks for the tips. If I add cxxflags = -Wno-c++11-narrowing, and also make sure that clang++ is the system version, not from conda, then it indeed compiles. This second requirement I suppose is related to https://github.com/pymc-devs/pytensor/issues/1342.
So odd that the conda clang fails, can you check what versions each are?
System clang++
Apple clang version 17.0.0 (clang-1700.0.13.5)
Target: arm64-apple-darwin24.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Conda clang++
clang version 19.1.7
Target: arm64-apple-darwin24.5.0
Thread model: posix
InstalledDir: /Users/johnnie/miniforge3/envs/pytensor/bin
Maybe of note is that if I install clang=17 from conda that doesn't fix the issue however.
Yeah it's probably other settings or linked libraries. @maresb any query that would be useful?
I thought this had something to do with xcode commandline tools, but I haven't been able to track it down. I read on SO that there can be linking errors if you do system migration from an intel mac to an ARM one with xcode installed, which I did. I uninstalled and reinstalled everything, but I haven't solved the issue.
Everytime I install pytensor on my mac, I have to force uninstall the clang tools that are installed, as described here.
I think we need to somehow provision an Apple Silicon VM that me, @lucianopaz and @ricardoV94 can hammer away at to figure out what's going on. Troubleshooting this via asking people to run commands for me has not been productive.