CV-CUDA [QUESTION] Python osd_into() wont accept batch elements

What is your question?

I am trying to overlay lines on video frames in a performant way using osd_into(). I am having trouble configuring the elements argument to be accepted by the function.

Docker container (Nvidia cuda 11.8 w Ubuntu 22.04) NVIDIA rtx a5000 ada

Workflow:

VPF Decoding YOLO inference results processing CVCUDA post process overlay (osd_into) -Pytorch tensor of frames -> cvcuda tensor -list of list for cvcuda.line -> cvcuda.elements

Here is my method to pull tensors and elements together, and give to osd_into() as arguments

def _process_with_cvcuda(self): """Process batches from the queue and overlay regression lines using CVCUDA OSD.""" try: # Initialize PyCUDA Stream self.cuda_ctx = cuda.Device(self.gpu_id).make_context() logging.info(f"Activated CUDA context on device {self.gpu_id} for processing thread.") cuda_stream = cuda.Stream()

        while self.running:
            try:
                # Step 1: Get a batch from the shared queue
                batch_data = self.shared_queue.get(timeout=1)

                if batch_data is None:
                    logging.info("CVCUDA processor: Received EOF marker. Stopping.")
                    break

                # Step 2: Unpack batch data
                batch_tensor, frame_ids = batch_data
                batch_size = len(frame_ids)

                # Ensure the tensor is uint8 before conversion
                batch_tensor = batch_tensor.to(torch.uint8)

                # Log Tensor Information
                logging.info(f"Processing Batch - Frame Count: {batch_size}")
                logging.info(f"   - Shape: {batch_tensor.shape}")
                logging.info(f"   - Dtype: {batch_tensor.dtype}")
                logging.info(f"   - Device: {batch_tensor.device}")
                logging.info(f"   - Memory Pointer: {batch_tensor.data_ptr()}")

                # Convert PyTorch tensor to NVCV tensor
                frames_nvcv = cvcuda.as_tensor(batch_tensor)

                # Log Converted Tensor Info
                logging.info(f"Converted frames_nvcv Tensor: {frames_nvcv}")
                logging.info(f"   - Shape: {frames_nvcv.shape}")
                logging.info(f"   - Dtype: {frames_nvcv.dtype}")

                # Step 3: Fetch & Package `cvcuda.Line` Objects for Each Frame
                batch_elements = []
                for idx, frame_id in enumerate(frame_ids):
                    regression_lines = self.gpu_regression_results.get(frame_id, [])

                    if not isinstance(regression_lines, list):
                        logging.warning(f"Frame {frame_id}: Expected list, got {type(regression_lines)}. Replacing with empty list.")
                        regression_lines = []

                    if not all(isinstance(line, cvcuda.Line) for line in regression_lines):
                        logging.error(f"Frame {frame_id}: Some elements are not cvcuda.Line objects. Using empty list.")
                        regression_lines = []

                    # Append as a nested list to ensure correct format
                    batch_elements.append(regression_lines)
                    #logging.info(f"Frame {frame_id}: Added {len(regression_lines)} line(s)")

                # Log batch elements structure
                logging.info(f"Batch Elements Length: {len(batch_elements)}")
                for i, frame_lines in enumerate(batch_elements[:3]):  # Log first 3 frames
                    logging.info(f"Frame {frame_ids[i]}: Contains {len(frame_lines)} lines.")
                    #for line in frame_lines[:3]:  # Log first 3 lines
                        #logging.info(f"  - Line Object: {line}")

                # Construct cvcuda.Elements object
                elements = cvcuda.Elements(batch_elements)
                logging.info(f"Attributes of elements: {dir(elements)}")

                logging.info("Successfully created cvcuda.Elements object.")

                # Extract Capsule from Elements
                capsule_elements = elements.capsule() if hasattr(elements, "capsule") else elements
                logging.info(f"Extracted Capsule Type: {type(capsule_elements)}")

                # Ensure CUDA stream is synchronized before OSD
                cuda_stream.synchronize()

                # Apply OSD Overlay
                cvcuda.osd_into(dst=frames_nvcv, src=frames_nvcv, elements=capsule_elements, stream=cuda_stream.handle)
                logging.info("Successfully applied OSD overlay.")

                # Convert back to PyTorch tensor
                result_tensor = torch.tensor(frames_nvcv.cuda(), dtype=torch.uint8)

                # Put processed batch in output queue
                self.output_queue.put((result_tensor, frame_ids))
                logging.info(f"CVCUDA processor: Processed batch with frame IDs: {frame_ids}")

            except queue.Empty:
                continue
            except Exception as e:
                logging.error(f"Error in CVCUDA processing: {e}")
                import traceback
                traceback.print_exc()

    except Exception as e:
        logging.error(f"CVCUDA processor initialization error: {e}")
        import traceback
        traceback.print_exc()
    finally:
        logging.info("CVCUDA processing thread exiting")
        self.output_queue.put(None)

This is the logging output with error. What is the correct input structure for the elements argument??

2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Successfully created cvcuda.Elements object. 2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Extracted Capsule Type: <class 'cvcuda.Elements'> 2025-03-09 03:38:28,016 [Thread-10 (_process_with_cvcuda)] Error in CVCUDA processing: osd_into(): incompatible function arguments. The following argument types are supported: 1. (dst: nvcv.Tensor, src: nvcv.Tensor, elements: capsule, *, stream: Optional[nvcv.cuda.Stream] = None) -> nvcv.Tensor

Invoked with: kwargs: dst=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, src=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, elements=<cvcuda.Elements object at 0x7f524ca99db0>, stream=139993385369472 Traceback (most recent call last): File "/workspace/Core_Module_v2/Picasso.py", line 289, in _process_with_cvcuda cvcuda.osd_into(dst=frames_nvcv, src=frames_nvcv, elements=capsule_elements, stream=cuda_stream.handle) TypeError: osd_into(): incompatible function arguments. The following argument types are supported: 1. (dst: nvcv.Tensor, src: nvcv.Tensor, elements: capsule, *, stream: Optional[nvcv.cuda.Stream] = None) -> nvcv.Tensor

Invoked with: kwargs: dst=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, src=<nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8>, elements=<cvcuda.Elements object at 0x7f524ca99db0>, stream=139993385369472 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Processing Batch - Frame Count: 20 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Shape: torch.Size([20, 3, 1088, 1920]) 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Dtype: torch.uint8 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Device: cuda:0 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Memory Pointer: 52506394624 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Converted frames_nvcv Tensor: <nvcv.Tensor shape=(20, 3, 1088, 1920) dtype=uint8> 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Shape: (20, 3, 1088, 1920) 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] - Dtype: uint8 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Batch Elements Length: 20 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 243: Contains 2 lines. 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 245: Contains 2 lines. 2025-03-09 03:38:28,031 [Thread-10 (_process_with_cvcuda)] Frame 247: Contains 2 lines. 2025-03-09 03:38:28,032 [Thread-10 (_process_with_cvcuda)] Attributes of elements: ['class', 'delattr', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'pybind11_conduit_v1']

Mar 09 '25 03:03 peterrrrob97

@peterrrrob97 are you sure the error is not because of passing cuda_stream.handle instead of a cvcuda stream here:

cvcuda.osd_into(dst=frames_nvcv, src=frames_nvcv, elements=capsule_elements, stream=cuda_stream.handle)

If you want to use externally allocated streams, you can use cvcuda.cuda.as_stream and pass the integer id or handle of the stream. See more here #228

Apr 03 '25 02:04 dsuthar-nvidia

@peterrrrob97 - Based on the response above, closing this issue for now. Feel free to re-open it if you the issue still persists or need more clarity. Thanks.

Jul 17 '25 05:07 shiremathNV