Assert on "spec.GetSchema().IsSerializable()" failed: could not serialize the operator: NumbaFuncImpl
Version
1.30
Describe the bug.
I'm trying to use a custom numba_function operator to generate segmentation masks from a set of points representing a polygon. However, it fails with the following error:
RuntimeError: [/opt/dali/dali/pipeline/pipeline.cc:744] Assert on "spec.GetSchema().IsSerializable()" failed: Could not serialize the operator: NumbaFuncImpl
The pipeline fails regardless of the contents of the setup_fn and run_fn contents.
It also happens with python_function.
Minimum reproducible example
pipeline = dali.pipeline.Pipeline(
batch_size=batch_size,
num_threads=num_threads,
device_id=device_id,
prefetch_queue_depth=1,
)
with pipeline:
inputs = fn.readers.tfrecord(
path=tfrecord_files,
index_path=tfrecord_index_files,
features={
"detector_points": dali.tfrecord.FixedLenFeature(
([8]),
dali.tfrecord.float32,
0.0,
),
},
)
points = inputs["detector_points"]
points = fn.reshape(points, shape=[-1, 2])
targets = numba_function(
points,
run_fn=create_segmentation_mask,
setup_fn=create_segmentation_mask_setup,
out_types=[dali.types.DALIDataType.FLOAT],
in_types=[dali.types.DALIDataType.FLOAT],
outs_ndim=[2],
ins_ndim=[2],
device="cpu",
)
targets = targets.gpu()
pipeline.set_outputs(targets)
def create_segmentation_mask(mask: np.ndarray, polygon: np.ndarray) -> None:
pass
def create_segmentation_mask_setup(outs: np.ndarray, ins: np.ndarray) -> None:
pass
Relevant log output
Traceback (most recent call last):
File "/home/username/.local/lib/python3.11/site-packages/nvidia/dali/plugin/tf.py", line 197, in serialize_pipeline
return pipeline.serialize()
^^^^^^^^^^^^^^^^^^^^
File "/home/username/.local/lib/python3.11/site-packages/nvidia/dali/pipeline.py", line 1230, in serialize
ret = self._pipe.SerializeToProtobuf()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [/opt/dali/dali/pipeline/pipeline.cc:744] Assert on "spec.GetSchema().IsSerializable()" failed: Could not serialize the operator: NumbaFuncImpl
Stacktrace (28 entries):
[frame 0]: /home/username/.local/lib/python3.11/site-packages/nvidia/dali/libdali.so(+0xde8db) [0x7f333bf9e8db]
[frame 1]: /home/username/.local/lib/python3.11/site-packages/nvidia/dali/libdali.so(dali::Pipeline::SerializeToProtobuf() const+0x2c6) [0x7f333c0b1846]
[frame 2]: /home/username/.local/lib/python3.11/site-packages/nvidia/dali/backend_impl.cpython-311-x86_64-linux-gnu.so(+0x41ddb) [0x7f332030cddb]
[frame 3]: /home/username/.local/lib/python3.11/site-packages/nvidia/dali/backend_impl.cpython-311-x86_64-linux-gnu.so(+0xc03ca) [0x7f332038b3ca]
[frame 4]: python() [0x5517cb]
[frame 5]: python(_PyObject_MakeTpCall+0x26c) [0x52cefc]
[frame 6]: python(_PyEval_EvalFrameDefault+0x7be) [0x539dde]
[frame 7]: python(_PyFunction_Vectorcall+0x173) [0x562683]
[frame 8]: python() [0x56a3ee]
[frame 9]: python() [0x52d420]
[frame 10]: python(PyObject_Call+0x1e4) [0x56cca4]
[frame 11]: python(_PyEval_EvalFrameDefault+0x4330) [0x53d950]
[frame 12]: python(_PyFunction_Vectorcall+0x173) [0x562683]
[frame 13]: python() [0x56a3ee]
[frame 14]: python(_PyObject_MakeTpCall+0x243) [0x52ced3]
[frame 15]: python(_PyEval_EvalFrameDefault+0x7be) [0x539dde]
[frame 16]: python() [0x60ec24]
[frame 17]: python(PyEval_EvalCode+0x97) [0x60e287]
[frame 18]: python() [0x62f74b]
[frame 19]: python() [0x62bc94]
[frame 20]: python() [0x640115]
[frame 21]: python(_PyRun_SimpleFileObject+0x194) [0x63f744]
[frame 22]: python(_PyRun_AnyFileObject+0x47) [0x63f4a7]
[frame 23]: python(Py_RunMain+0x2c9) [0x639f09]
[frame 24]: python(Py_BytesMain+0x2d) [0x5fdb5d]
[frame 25]: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f341bc4ad90]
[frame 26]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f341bc4ae40]
[frame 27]: python(_start+0x25) [0x5fd9e5]
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/context/code/package/train/src/train.py", line 142, in <module>
run_model_training(
File "/context/code/package/train/src/train.py", line 72, in run_model_training
dataset = create_dali_dataset(split_name, batch_size_per_worker)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/context/code/package/src/input_fn_creator.py", line 84, in create_dali_dataset
return DALIDataset(
^^^^^^^^^^^^
File "/home/username/.local/lib/python3.11/site-packages/nvidia/dali/plugin/tf.py", line 803, in __init__
dataset_impl = _DALIDatasetImpl(pipeline, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/username/.local/lib/python3.11/site-packages/nvidia/dali/plugin/tf.py", line 457, in __init__
self._pipeline_serialized = serialize_pipeline(pipeline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/username/.local/lib/python3.11/site-packages/nvidia/dali/plugin/tf.py", line 199, in serialize_pipeline
raise RuntimeError("Error during pipeline initialization. Note that some operators "
RuntimeError: Error during pipeline initialization. Note that some operators (e.g. Python Operators) cannot be used with TensorFlow Dataset API and DALIIterator.
Other/Misc.
It happens on both of the following setups:
Python 3.11 TensorFlow 2.14 CUDA 11.8
Python 3.8 Tensorflow 3.11 CUDA 11.3
Check for duplicates
- [X] I have searched the open bugs/issues and have found no duplicates for this bug report
Hi @cyanic-selkie,
I'm afraid that due to the way DALI integrates with TensroFlow (via the native code) it is not possible to execute python defined functions, including NUMBA operator. For now, I'm afraid you need to create a custom operator.