isaac_ros_dnn_inference icon indicating copy to clipboard operation
isaac_ros_dnn_inference copied to clipboard

GXF_OUT_OF_MEMORY with 12GB graphics card

Open ammar-n-abbas opened this issue 1 year ago • 0 comments

Hi, we are trying to run foundationpose using Isaac_ROS Docker. We are facing a similar issue this: https://github.com/NVIDIA-ISAAC-ROS/isaac_ros_dnn_inference/issues/29. @jaiveersinghNV metioned that 8 GB GPU memory might be less but we are using Nvidia RTX A2000 12GB graphics card. When we monitor the the GPU usage while launching through realsense or rosbag example for foundation pose, the memory usage does not go above 6.5 GB and we get the following error (also pasting all the warnings):

[component_container_mt-1] 2024-08-22 11:39:42.353 WARN  gxf/std/program.cpp@532: No GXF scheduler specified.
.
[component_container_mt-1] 2024-08-22 11:39:43.643 WARN  gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dev_id' in component 'stream'.
.
[component_container_mt-1] 2024-08-22 11:39:46.508 WARN  gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dummy_rx' in component ''.
[component_container_mt-1] 2024-08-22 11:39:46.509 WARN  gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dummy_rx' in component ''.
[component_container_mt-1] 2024-08-22 11:39:46.509 WARN  gxf/std/yaml_file_loader.cpp@1077: Using unregistered parameter 'dev_id' in component 'stream'.
.
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
[component_container_mt-1] 2024-08-22 11:39:46.514 WARN  gxf/std/scheduling_terms.cpp@333: 'min_size' parameter in MultiMessageAvailableSchedulingTerm is deprecated. Use 'min_sum' with SumOfAll sampling mode instead
.
[component_container_mt-1] [INFO] [1724323186.518284133] [foundationpose_node]: [NitrosNode] Node was started
[component_container_mt-1] Could not open file 
[component_container_mt-1] Could not open file 
[component_container_mt-1] [ERROR] [1724323187.887672387] [TRT]: TRT ERROR: ModelImporter.cpp:733: Failed to parse ONNX model from file: 
[component_container_mt-1] [ERROR] [1724323188.011007469] [tensor_rt]: Unable to read tensor shape info from TRT Model Engine or from ONNX file.
[component_container_mt-1] [WARN] [1724323188.011051778] [tensor_rt]: Failed to get block size from model, set to the default size: 67108864.
[component_container_mt-1] [INFO] [1724323188.011117585] [tensor_rt]: Tensors 67108864 bytes, num outputs 40 x tensors per output 3 = 120 blocks
[component_container_mt-1] [INFO] [1724323188.011197329] [tensor_rt]: [NitrosNode] Initializing and running GXF graph
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/std/block_memory_pool.cpp@77: Failure in cudaMalloc. cuda_error: cudaErrorMemoryAllocation, error_str: out of memory
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/std/entity_warden.cpp@437: Failed to initialize component 00157 (pool)
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/core/runtime.cpp@702: Could not initialize entity 'XMIYROHBPH_inference' (E152): GXF_OUT_OF_MEMORY
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/std/program.cpp@283: Failed to activate entity 00152 named XMIYROHBPH_inference: GXF_OUT_OF_MEMORY
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/std/program.cpp@285: Deactivating...
[component_container_mt-1] 2024-08-22 11:39:48.015 ERROR gxf/core/runtime.cpp@1452: Graph activation failed with error: GXF_OUT_OF_MEMORY
[component_container_mt-1] [ERROR] [1724323188.015653036] [tensor_rt]: [NitrosContext] GxfGraphActivate Error: GXF_OUT_OF_MEMORY
[component_container_mt-1] [ERROR] [1724323188.015668492] [tensor_rt]: [NitrosNode] runGraphAsync Error: GXF_OUT_OF_MEMORY
[component_container_mt-1] terminate called after throwing an instance of 'std::runtime_error'
[component_container_mt-1]   what():  [NitrosNode] runGraphAsync Error: GXF_OUT_OF_MEMORY
[ERROR] [component_container_mt-1]: process has died [pid 922, exit code -6, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_mt --ros-args -r __node:=container -r __ns:=/isaac_ros_examples'].

ammar-n-abbas avatar Aug 22 '24 11:08 ammar-n-abbas