tensorflow-deeplab-v3-plus icon indicating copy to clipboard operation
tensorflow-deeplab-v3-plus copied to clipboard

The inference result not so well in Jetson TX2

Open newip opened this issue 6 years ago • 4 comments

Dear Rishizek, I am using Nvidia TX2 for a inferencing try. It worked. But the result not good enough. Do you have any clue for possible reasons? Thanks for checking.

Informations:

nvidia@tegra-ubuntu$ uname -r
4.4.38-tegra
nvidia@tegra-ubuntu$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="16.04.4 LTS (Xenial Xerus)"
nvidia@tegra-ubuntu:~/dev/github/tensorflow-deeplab-v3-plus$ python3 inference.py --data_dir '/home/nvidia/dev/dev_dataset/data_dir' --infer_data_list '/home/nvidia/dev/dev_dataset/data_dir/imagelist.txt' --model_dir '/home/nvidia/dev/pretrainedmodels/deeplabv3plus_ver1' --output_dir '/home/nvidia/dev/dev_dataset/data_dir'
2018-04-20 03:37:35.811927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-04-20 03:37:35.812095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.66GiB freeMemory: 5.99GiB
2018-04-20 03:37:35.812147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-20 03:37:37.122109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-20 03:37:37.122239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-20 03:37:37.122271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-20 03:37:37.122503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5442 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_num_ps_replicas': 0, '_save_summary_steps': 100, '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_tf_random_seed': None, '_task_id': 0, '_save_checkpoints_steps': None, '_model_dir': '/home/nvidia/dev/pretrainedmodels/deeplabv3plus_ver1', '_log_step_count_steps': 100, '_keep_checkpoint_max': 5, '_evaluation_master': '', '_task_type': 'worker', '_is_chief': True, '_global_id_in_cluster': 0, '_num_worker_replicas': 1, '_session_config': None, '_service': None, '_save_checkpoints_secs': 600, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f6c03e710>}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
2018-04-20 03:38:08.699511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-20 03:38:08.699640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-20 03:38:08.699675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2018-04-20 03:38:08.699729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2018-04-20 03:38:08.699834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5442 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
INFO:tensorflow:Restoring parameters from /home/nvidia/dev/pretrainedmodels/deeplabv3plus_ver1/model.ckpt-30358
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2018-04-20 03:38:32.594527: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
generating: /home/nvidia/dev/dev_dataset/data_dir/2007_000129_mask.png

2007_000129 2007_000129_mask

newip avatar Apr 20 '18 03:04 newip

model is big

发自我的 iPhone

在 2018年4月20日,上午11:48,Robin [email protected] 写道:

Dear Rishizek, I am using Nvidia TX2 for a inferencing try. It worked. But the result not good enough. Do you have any clue for possible reasons? Thanks for checking.

Informations:

nvidia@tegra-ubuntu$ uname -r 4.4.38-tegra nvidia@tegra-ubuntu$ cat /etc/os-release NAME="Ubuntu" VERSION="16.04.4 LTS (Xenial Xerus)" nvidia@tegra-ubuntu:~/dev/github/tensorflow-deeplab-v3-plus$ python3 inference.py --data_dir '/home/nvidia/dev/dev_dataset/data_dir' --infer_data_list '/home/nvidia/dev/dev_dataset/data_dir/imagelist.txt' --model_dir '/home/nvidia/dev/pretrainedmodels/deeplabv3plus_ver1' --output_dir '/home/nvidia/dev/dev_dataset/data_dir' 2018-04-20 03:37:35.811927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero 2018-04-20 03:37:35.812095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005 pciBusID: 0000:00:00.0 totalMemory: 7.66GiB freeMemory: 5.99GiB 2018-04-20 03:37:35.812147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-20 03:37:37.122109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-20 03:37:37.122239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-20 03:37:37.122271: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-20 03:37:37.122503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5442 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) INFO:tensorflow:Using default config. INFO:tensorflow:Using config: {'_num_ps_replicas': 0, '_save_summary_steps': 100, '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_tf_random_seed': None, '_task_id': 0, '_save_checkpoints_steps': None, '_model_dir': '/home/nvidia/dev/pretrainedmodels/deeplabv3plus_ver1', '_log_step_count_steps': 100, '_keep_checkpoint_max': 5, '_evaluation_master': '', '_task_type': 'worker', '_is_chief': True, '_global_id_in_cluster': 0, '_num_worker_replicas': 1, '_session_config': None, '_service': None, '_save_checkpoints_secs': 600, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f6c03e710>} INFO:tensorflow:Calling model_fn. INFO:tensorflow:Done calling model_fn. INFO:tensorflow:Graph was finalized. 2018-04-20 03:38:08.699511: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0 2018-04-20 03:38:08.699640: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-04-20 03:38:08.699675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 2018-04-20 03:38:08.699729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N 2018-04-20 03:38:08.699834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5442 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2) INFO:tensorflow:Restoring parameters from /home/nvidia/dev/pretrainedmodels/deeplabv3plus_ver1/model.ckpt-30358 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. 2018-04-20 03:38:32.594527: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 4.00G (4294967296 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY generating: /home/nvidia/dev/dev_dataset/data_dir/2007_000129_mask.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

hexiangquan avatar Apr 21 '18 05:04 hexiangquan

With running the same command with Titan GPU, it works well and as below. It consumes calculation power...

2007_000129_mask_titan

newip avatar Apr 24 '18 08:04 newip

Hi @newip , thank you for your interest in the repo. I never run the model with Jetson TX2, so I don't have concrete answer. The inference should work even with CPU on regular PC. The following is my guesses, which may be incorrect:

  1. Input image size is differ from trained image size. You may refer to here for detail.
  2. Input images are converted from standard JPG file to something else before the model.
  3. Maybe Jeston TX2 quantizes weight for increasing inference performance and the IoU decreased, although I'm not sure Jetson TX2 indeed quantizes weight.
  4. Lack of GPU memory affected the model performance. Maybe you can resize the input image smaller.

I hope this helps solve your problem.

rishizek avatar Apr 25 '18 15:04 rishizek

Thanks @rishizek , I will make a try for smaller size jpg file.

newip avatar Apr 26 '18 05:04 newip