[QUESTION] Python osd_into() wont accept batch elements
What is your question?
I am trying to overlay lines on video frames in a performant way using osd_into(). I am having trouble configuring the elements argument to be accepted by the function.
Docker container (Nvidia cuda 11.8 w Ubuntu 22.04) NVIDIA rtx a5000 ada
Workflow:
VPF Decoding YOLO inference results processing CVCUDA post process overlay (osd_into) -Pytorch tensor of frames -> cvcuda tensor -list of list for cvcuda.line -> cvcuda.elements
Here is my method to pull tensors and elements together, and give to osd_into() as arguments
def _process_with_cvcuda(self): """Process batches from the queue and overlay regression lines using CVCUDA OSD.""" try: # Initialize PyCUDA Stream self.cuda_ctx = cuda.Device(self.gpu_id).make_context() logging.info(f"Activated CUDA context on device {self.gpu_id} for processing thread.") cuda_stream = cuda.Stream()
while self.running:
try:
# Step 1: Get a batch from the shared queue
batch_data = self.shared_queue.get(timeout=1)
if batch_data is None:
logging.info("CVCUDA processor: Received EOF marker. Stopping.")
break
# Step 2: Unpack batch data
batch_tensor, frame_ids = batch_data
batch_size = len(frame_ids)
# Ensure the tensor is uint8 before conversion
batch_tensor = batch_tensor.to(torch.uint8)
# Log Tensor Information
logging.info(f"Processing Batch - Frame Count: {batch_size}")
logging.info(f" - Shape: {batch_tensor.shape}")
logging.info(f" - Dtype: {batch_tensor.dtype}")
logging.info(f" - Device: {batch_tensor.device}")
logging.info(f" - Memory Pointer: {batch_tensor.data_ptr()}")
# Convert PyTorch tensor to NVCV tensor
frames_nvcv = cvcuda.as_tensor(batch_tensor)
# Log Converted Tensor Info
logging.info(f"Converted frames_nvcv Tensor: {frames_nvcv}")
logging.info(f" - Shape: {frames_nvcv.shape}")
logging.info(f" - Dtype: {frames_nvcv.dtype}")
# Step 3: Fetch & Package `cvcuda.Line` Objects for Each Frame
batch_elements = []
for idx, frame_id in enumerate(frame_ids):
regression_lines = self.gpu_regression_results.get(frame_id, [])
if not isinstance(regression_lines, list):
logging.warning(f"Frame {frame_id}: Expected list, got {type(regression_lines)}. Replacing with empty list.")
regression_lines = []
if not all(isinstance(line, cvcuda.Line) for line in regression_lines):
logging.error(f"Frame {frame_id}: Some elements are not cvcuda.Line objects. Using empty list.")
regression_lines = []
# Append as a nested list to ensure correct format
batch_elements.append(regression_lines)
#logging.info(f"Frame {frame_id}: Added {len(regression_lines)} line(s)")
# Log batch elements structure
logging.info(f"Batch Elements Length: {len(batch_elements)}")
for i, frame_lines in enumerate(batch_elements[:3]): # Log first 3 frames
logging.info(f"Frame {frame_ids[i]}: Contains {len(frame_lines)} lines.")
#for line in frame_lines[:3]: # Log first 3 lines
#logging.info(f" - Line Object: {line}")
# Construct cvcuda.Elements object
elements = cvcuda.Elements(batch_elements)
logging.info(f"Attributes of elements: {dir(elements)}")
logging.info("Successfully created cvcuda.Elements object.")
# Extract Capsule from Elements
capsule_elements = elements.capsule() if hasattr(elements, "capsule") else elements
logging.info(f"Extracted Capsule Type: {type(capsule_elements)}")
# Ensure CUDA stream is synchronized before OSD
cuda_stream.synchronize()
# Apply OSD Overlay
cvcuda.osd_into(dst=frames_nvcv, src=frames_nvcv, elements=capsule_elements, stream=cuda_stream.handle)
logging.info("Successfully applied OSD overlay.")
# Convert back to PyTorch tensor
result_tensor = torch.tensor(frames_nvcv.cuda(), dtype=torch.uint8)
# Put processed batch in output queue
self.output_queue.put((result_tensor, frame_ids))
logging.info(f"CVCUDA processor: Processed batch with frame IDs: {frame_ids}")
except queue.Empty:
continue
except Exception as e:
logging.error(f"Error in CVCUDA processing: {e}")
import traceback
traceback.print_exc()
except Exception as e:
logging.error(f"CVCUDA processor initialization error: {e}")
import traceback
traceback.print_exc()
finally:
logging.info("CVCUDA processing thread exiting")
self.output_queue.put(None)
This is the logging output with error. What is the correct input structure for the elements argument??
2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Successfully created cvcuda.Elements object. 2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Extracted Capsule Type: <class 'cvcuda.Elements'> 2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Error in CVCUDA processing: osd_into(): incompatible function arguments. The following argument types are supported: 1. (dst: nvcv.Tensor, src: nvcv.Tensor, elements: capsule, *, stream: Optional[nvcv.cuda.Stream] = None) -> nvcv.Tensor
Invoked with: kwargs: dst=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, src=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, elements=<cvcuda.Elements object at 0x7f524ca99db0>, stream=139993385369472 Traceback (most recent call last): File "/workspace/Core_Module_v2/Picasso.py", line 289, in _process_with_cvcuda cvcuda.osd_into(dst=frames_nvcv, src=frames_nvcv, elements=capsule_elements, stream=cuda_stream.handle) TypeError: osd_into(): incompatible function arguments. The following argument types are supported: 1. (dst: nvcv.Tensor, src: nvcv.Tensor, elements: capsule, *, stream: Optional[nvcv.cuda.Stream] = None) -> nvcv.Tensor
Invoked with: kwargs: dst=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, src=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, elements=<cvcuda.Elements object at 0x7f524ca99db0>, stream=139993385369472 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Processing Batch - Frame Count: 20 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Shape: torch.Size([20, 3, 1088, 1920]) 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Dtype: torch.uint8 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Device: cuda:0 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Memory Pointer: 52506394624 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Converted frames_nvcv Tensor: <nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8> 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Shape: (20, 3, 1088, 1920) 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Dtype: uint8 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Batch Elements Length: 20 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 243: Contains 2 lines. 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 245: Contains 2 lines. 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 247: Contains 2 lines. 2025-03-09 03:38:28,032 [Thread-10 (_process_with_cvcuda)] Attributes of elements: ['class', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'pybind11_conduit_v1']
@peterrrrob97 are you sure the error is not because of passing cuda_stream.handle instead of a cvcuda stream here:
cvcuda.osd_into(dst=frames_nvcv, src=frames_nvcv, elements=capsule_elements, stream=cuda_stream.handle)
If you want to use externally allocated streams, you can use cvcuda.cuda.as_stream and pass the integer id or handle of the stream. See more here #228
@peterrrrob97 - Based on the response above, closing this issue for now. Feel free to re-open it if you the issue still persists or need more clarity. Thanks.