CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
Dear devs,
I recently updated the Holoscan SDK from the older v1.0.3 to the v2.1.0. My application was running on the older SDK and after update the app stoped working. It runs with a video file with this path (Path 1: replayer,ImageProcessing,preprocessor,inference,postprocessor,PostImageProcessing,viz) but when i try to run it with a video feed from AJA source operator the app shows this error. This error appears exactly when i try to preprocess the frame, but only happens using AJA not using replayer. I checked the new release notes and noticed that the update changed the way FormatConverterOp operates on host/device copies. I use Cupy to acquire the frame that was sent through a Tensor Holoscan, I think the error I am facing is related to Cupy waiting for data on the GPU and after the update this change of FormatConverterOp automatically performs a copy of the host->device could be the problem. What can I do to fix this?
[info] [gxf_executor.cpp:248] [AJA_arthrosegmentation] Creating context
[info] [gxf_executor.cpp:1691] Loading extensions from configs...
[warning] [gxf_resource.cpp:175] Existing entity already has a GPUDevice resource
[info] [gxf_executor.cpp:1897] Activating Graph...
[info] [gxf_executor.cpp:1929] [AJA_arthrosegmentation] Running Graph...
[info] [gxf_executor.cpp:1931] [AJA_arthrosegmentation] Waiting for completion...
2024-06-19 15:46:08.587 INFO gxf/std/greedy_scheduler.cpp@191: Scheduling 8 entities
[info] [aja_source.cpp:386] AJA Source: Capturing from NTV2_CHANNEL1
[info] [aja_source.cpp:387] AJA Source: RDMA is disabled
[info] [aja_source.cpp:393] AJA Source: Overlay output is disabled
[info] [infer_utils.cpp:222] Input tensor names empty from Config. Creating from pre_processor map.
[info] [infer_utils.cpp:224] Input Tensor names: [source_video]
[info] [infer_utils.cpp:258] Output tensor names empty from Config. Creating from inference map.
[info] [infer_utils.cpp:260] Output Tensor names: [output]
[info] [inference.cpp:208] Inference Specifications created
[info] [infer_manager.cpp:825] Inference context ID: AJA_arthrosegmentation_[]_
[info] [core.cpp:46] TRT Inference: converting ONNX model at ../data/arthroscopic_segmentation/model/model_full_image_for_clahe_converted.onnx
[info] [utils.cpp:76] Cached engine found: ../data/arthroscopic_segmentation/model/model_full_image_for_clahe_converted.Orin.8.7.16.trt.
[info] [core.cpp:79] Loading Engine: ../data/arthroscopic_segmentation/model/model_full_image_for_clahe_converted.Orin.8.7.16.trt.
[info] [utils.hpp:46] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors.
[info] [core.cpp:122] Engine loaded: ../data/arthroscopic_segmentation/model/model_full_image_for_clahe_converted.Orin.8.7.16.trt.
[info] [infer_manager.cpp:386] HoloInfer buffer created for output
[info] [inference.cpp:219] Inference context setup complete
error: XDG_RUNTIME_DIR not set in the environment.
[info] [context.cpp:50] _______________
[info] [context.cpp:50] Vulkan Version:
[info] [context.cpp:50] - available: 1.3.204
[info] [context.cpp:50] - requesting: 1.2.0
[info] [context.cpp:50] ______________________
[info] [context.cpp:50] Used Instance Layers :
[info] [context.cpp:50]
[info] [context.cpp:50] Used Instance Extensions :
[info] [context.cpp:50] VK_KHR_surface
[info] [context.cpp:50] VK_KHR_xcb_surface
[info] [context.cpp:50] VK_EXT_debug_utils
[info] [context.cpp:50] VK_KHR_external_memory_capabilities
[info] [context.cpp:50] ____________________
[info] [context.cpp:50] Compatible Devices :
[info] [context.cpp:50] 0: NVIDIA Tegra Orin (nvgpu)
[info] [context.cpp:50] Physical devices found :
[info] [context.cpp:50] 1
[info] [context.cpp:50] ________________________
[info] [context.cpp:50] Used Device Extensions :
[info] [context.cpp:50] VK_KHR_swapchain
[info] [context.cpp:50] VK_KHR_external_memory
[info] [context.cpp:50] VK_KHR_external_memory_fd
[info] [context.cpp:50] VK_KHR_external_semaphore
[info] [context.cpp:50] VK_KHR_external_semaphore_fd
[info] [context.cpp:50] VK_KHR_push_descriptor
[info] [context.cpp:50] VK_EXT_line_rasterization
[info] [context.cpp:50]
[info] [vulkan_app.cpp:845] Using device 0: NVIDIA Tegra Orin (nvgpu) (UUID 40d49d1be05a5cd98e6a4eb6cbd06e34)
frame count: 0
[error] [gxf_wrapper.cpp:84] Exception occurred for operator: 'ImageProcessing' - CUDARuntimeError: cudaErrorIllegalAddress: an illegal memory access was encountered
cupy_backends/cuda/api/runtime.pyx(144): cupy_backends.cuda.api.runtime.check_status
/usr/local/lib/python3.10/dist-packages/cupy/_creation/from_data.py(75): asarray
/opt/nvidia/holoscan/examples/MyModel_laser_segmentation/python/AJA_arthrosegmentation_debugging.py(158): compute
2024-06-19 15:46:09.300 ERROR gxf/std/entity_executor.cpp@552: Failed to tick codelet ImageProcessing in entity: ImageProcessing code: GXF_FAILURE
2024-06-19 15:46:09.300 WARN gxf/std/greedy_scheduler.cpp@243: Error while executing entity 26 named 'ImageProcessing': GXF_FAILURE
[info] [utils.hpp:46] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [defaultAllocator.cpp::deallocate::61] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[info] [utils.hpp:46] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
My code (Compute method)
def compute(self, op_input, op_output, context):
global global_range_start
with nvtx.annotate(message="Image Processing", color="blue"):
# Record the start time
start_time = time.time()
## Preprocess file
image_size = 1024
resize_size = (1920, 1080)
self.final_size = (image_size, image_size)
#load the input tensor/original image
message_frame = op_input.receive("input_tensor") # Receive the input tensor
print("frame count:", self.framecount)
input_tensor = message_frame.get("")
frame = cp.asnumpy(input_tensor)
#frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
#save input frame
i = self.framecount
# dir = "DebuggingImageProcessing"
# os.makedirs(dir, exist_ok=True)
# filename_in = os.path.join(dir, f"frame_in{i}.png")
# cv2.imwrite(filename_in, cv2.cvtColor(frame, cv2.COLOR_RGB2BGR))
#print("Type of 'np_frame_array':"np_frame_array)
#print("PREPROCESSING: Shape of 'frame in':", frame.shape)
#print("PREPROCESSING: dtype of 'frame in': ", frame.dtype)
#assert isinstance(frame, np.ndarray)
self.original_size = tuple(reversed(frame.shape[:-1]))
self.resized_size = resize_size
start_time_preprocessing = time.time()
processed_frame = holoscan_preprocessing.run(frame)
end_time_preprocessing = time.time()
#print(f"time pre-processing: {end_time_preprocessing - start_time_preprocessing}", flush=True) # Print the time taken for pre-processing in seconds
# Python Preprocessing Code
# load
self.original_size = tuple(reversed(frame.shape[:-1]))
# resize
resized_image = ImageProcessingOp.resize(frame, size = resize_size, is_label = False)
self.resized_size = tuple(reversed(resized_image.shape[:-1]))
# crop image
size_middle = (resize_size[0] - resize_size[1]) // 2 # 420
self.crop_slice = slice(size_middle, resize_size[0] - size_middle) # (0, 1080)
cropped_image = resized_image[:, self.crop_slice] # Crop the image
self.cropped_size = tuple(reversed(cropped_image.shape[:-1])) # (1080, 1080)
processed_image, crop_mask = ImageProcessingOp.crop_outside_circle(cropped_image) # Crop the outside of the circle
roi_mask = ~crop_mask[...,np.newaxis]
clahe = cv2.createCLAHE()
for channel_idx, channel in enumerate(np.moveaxis(processed_image, -1, 0).copy()):
processed_image[roi_mask[...,0], channel_idx] = np.squeeze(clahe.apply(channel[roi_mask[...,0]]), axis=-1)
processed_frame = ImageProcessingOp.resize(processed_image,
self.image = processed_frame
#print("PREPROCESSING: Shape of 'frame out resized':", processed_frame.shape)
#print("PREPROCESSING: dtype of 'frame out resized': ", processed_frame.dtype)
# # Record the end time
# end_time = time.time()
# # Calculate and print the FPS
# fps = 1.0 / (end_time - start_time)
# print(f"FPS pre-processing: {fps}", flush=True)
#processed_frame = cv2.cvtColor(processed_frame, cv2.COLOR_RGB2BGR)
#save the preprocessed frame
# filename_out = os.path.join(dir, f"frame_processed_out{i}.png")
# cv2.imwrite(filename_out, cv2.cvtColor(processed_frame, cv2.COLOR_RGB2BGR))
processed_frame = cp.asarray(processed_frame)
self.framecount += 1
out_message = Entity(context)
# Send the processed frame to the output tensor
op_output.emit(out_message, "output_tensor")
# Start a new NVTX range and store it in the global variable
global_range_start = nvtx.start_range(message="Inference", color="red")