isaac_ros_dnn_inference
isaac_ros_dnn_inference copied to clipboard
GXF_OUT_OF_MEMORY with 12GB graphics card
Hi, we are trying to run foundationpose using Isaac_ROS Docker. We are facing a similar issue this: https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_dnn_inference/issues/29. @jaiveersinghNV metioned that 8 GB GPU memory might be less but we are using Nvidia RTX A2000 12GB graphics card. When we monitor the the GPU usage while launching through realsense or rosbag example for foundation pose, the memory usage does not go above 6.5 GB and we get the following error (also pasting all the warnings):
[component_container_mt-1] 2024-08-22 11:39:42.353 WARN gxf/std/program.cpp@532: No GXF scheduler specified.
.
[component_container_mt-1] 2024-08-22 11:39:43.643 WARN gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dev_id' in component 'stream'.
.
[component_container_mt-1] 2024-08-22 11:39:46.508 WARN gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dummy_rx' in component ''.
[component_container_mt-1] 2024-08-22 11:39:46.509 WARN gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dummy_rx' in component ''.
[component_container_mt-1] 2024-08-22 11:39:46.509 WARN gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dev_id' in component 'stream'.
.
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
.
[component_container_mt-1] [INFO] [1724323186.518284133] [foundationpose_node]: [NitrosNode] Node was started
[component_container_mt-1] Could not open file
[component_container_mt-1] Could not open file
[component_container_mt-1] [ERROR] [1724323187.887672387] [TRT]: TRT ERROR: ModelImporter.cpp:733: Failed to parse ONNX model from file:
[component_container_mt-1] [ERROR] [1724323188.011007469] [tensor_rt]: Unable to read tensor shape info from TRT Model Engine or from ONNX file.
[component_container_mt-1] [WARN] [1724323188.011051778] [tensor_rt]: Failed to get block size from model, set to the default size: 67108864.
[component_container_mt-1] [INFO] [1724323188.011117585] [tensor_rt]: Tensors 67108864 bytes, num outputs 40 x tensors per output 3 = 120 blocks
[component_container_mt-1] [INFO] [1724323188.011197329] [tensor_rt]: [NitrosNode] Initializing and running GXF graph
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/std/block_memory_pool.cpp@77: Failure in cudaMalloc. cuda_error: cudaErrorMemoryAllocation, error_str: out of memory
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/std/entity_warden.cpp@437: Failed to initialize component 00157 (pool)
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/core/runtime.cpp@702: Could not initialize entity 'XMIYROHBPH_inference' (E152): GXF_OUT_OF_MEMORY
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/std/program.cpp@283: Failed to activate entity 00152 named XMIYROHBPH_inference: GXF_OUT_OF_MEMORY
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/std/program.cpp@285: Deactivating...
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/core/runtime.cpp@1452: Graph activation failed with error: GXF_OUT_OF_MEMORY
[component_container_mt-1] [ERROR] [1724323188.015653036] [tensor_rt]: [NitrosContext] GxfGraphActivate Error: GXF_OUT_OF_MEMORY
[component_container_mt-1] [ERROR] [1724323188.015668492] [tensor_rt]: [NitrosNode] runGraphAsync Error: GXF_OUT_OF_MEMORY
[component_container_mt-1] terminate called after throwing an instance of 'std::runtime_error'
[component_container_mt-1] what(): [NitrosNode] runGraphAsync Error: GXF_OUT_OF_MEMORY
[ERROR] [component_container_mt-1]: process has died [pid 922, exit code -6, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_mt --ros-args -r __node:=container -r __ns:=/isaac_ros_examples'].