bonnetal
bonnetal copied to clipboard
terminate called after throwing an instance of 'c10::Error'
Hi, I am trying to run segmentation using pretrained model. I am using docker on Ubuntu 18.04 with GPU. nvidia-smi works fine (but whole gpu mem is already used for some training in the background)
nvidia-docker run -ti --rm -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -v $HOME/.Xauthority:/home/developer/.Xauthority -v /home/$USER:/home/$USER --net=host --pid=host -v /mnt/Data/dataset002mp4:/home/developer/dataset2 --ipc=host tano297/bonnetal:runtime /bin/bash
In docker:
cd deploy
catkin init
catkin build
cd ~/bonnetal/deploy/devel/lib/bonnetal_segmentation_standalone
./infer_img -p mapillary_darknet53_aspp_res_512_os8_40/ -i ~/dataset2/frames/00000001.jpg -v
I get:
================================================================================
image: /home/developer/dataset2/frames/00000001.jpg
path: mapillary_darknet53_aspp_res_512_os8_40//
backend: pytorch. Using default!
verbose: 1
================================================================================
Trying to open model
Could not send model to GPU, using CPU
terminate called after throwing an instance of 'c10::Error'
what(): open file failed, file path: mapillary_darknet53_aspp_res_512_os8_40///model.pytorch (FileAdapter at ../caffe2/serialize/file_adapter.cc:11)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6c (0x7f824a5e845c in /usr/local/lib/libc10.so)
frame #1: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x208 (0x7f82c2382538 in /usr/local/lib/libcaffe2.so)
frame #2: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x40 (0x7f824b1f9250 in /usr/local/lib/libtorch.so.1)
frame #3: bonnetal::segmentation::NetPytorch::NetPytorch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x3e1 (0x7f82c4e73171 in /home/developer/bonnetal/deploy/devel/.private/bonnetal_segmentation_lib/lib/libbonnetal_segmentation_lib.so)
frame #4: bonnetal::segmentation::make_net(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x3a6 (0x7f82c4e71926 in /home/developer/bonnetal/deploy/devel/.private/bonnetal_segmentation_lib/lib/libbonnetal_segmentation_lib.so)
frame #5: <unknown function> + 0x7dfe (0x55dc04edddfe in ./infer_img)
frame #6: __libc_start_main + 0xe7 (0x7f824bd55b97 in /lib/x86_64-linux-gnu/libc.so.6)
frame #7: <unknown function> + 0x87ea (0x55dc04ede7ea in ./infer_img)
Aborted (core dumped)
Anything obvious?
Is it related to no mem on GPU?
I tried also with CUDA_VISIBLE_DEVICES=''
I was looking for an example, how to use pretrained models, but haven't found any instructions.
I am finally going to use these models and present results on YT.
I will be very grateful for any help.
BTW, I am using docker because I have ROS1 with catkin_make
and no catkin
command.
Hi,
Have you made the model directory deploy ready? Check the instructions here.
After that you should have a .pytorch and a .onnx model files in the pretrained directory!
Let me know if it works like that.
Also, I haven't tried building the workspace with catkin_make, because we use catkin internally, but if you want to give the build a shot let me know how that works. You may want to either clean your workspace or start from a fresh one with just this package inside. If you have tensorrt you can greatly benefit from running the inference natively in your pc (docker has some performance issues with gpus, I havent been able to make it run 100% of the speed of my native linux install)
I am back! I installed it in AWS.
While trying to convert model I get:
bonnetal/train/tasks/segmentation$ ./make_deploy_model.py -p ~/mapillary_mobilenetsv2_aspp_res_512_os8_34 -l /tmp
----------
INTERFACE:
model path /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34
log dir /tmp
Height force None
Width force None
----------
Commit hash (training version): b'5368eed'
----------
model folder exists! Using model from /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34
[Errno 1] Operation not permitted: '.X11-unix'
Error creating log directory. Check permissions!
are you trying to empty the log directory?
2nd try:
~/bonnetal/train/tasks/segmentation$ ./make_deploy_model.py -p ~/mapillary_mobilenetsv2_aspp_res_512_os8_34 -l ~/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/
----------
INTERFACE:
model path /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34
log dir /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/
Height force None
Width force None
----------
Commit hash (training version): b'5368eed'
----------
model folder exists! Using model from /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34
Opening config file /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/cfg.yaml
Original OS: 32
New OS: 8.0
[Decoder] os: 4 in: 48 skip: 24 out: 24
[Decoder] os: 2 in: 24 skip: 16 out: 16
[Decoder] os: 1 in: 16 skip: 3 out: 32
Successfully loaded model backbone weights
Successfully loaded model decoder weights
Successfully loaded model head weights
Total number of parameters: 2319082
Total number of parameters requires_grad: 0
Creating dummy input to profile
Saving config file /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted//cfg.yaml
Profiling model
saving model in /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/model.onnx
../..//backbones/mobilenetv2.py:147: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if y.shape[2] < x.shape[2] or y.shape[3] < x.shape[3]:
../..//backbones/mobilenetv2.py:149: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert(x.shape[2]/y.shape[2] == x.shape[3]/y.shape[3])
Checking that it all worked out
Profiling model
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/jit/__init__.py:745: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
Not within tolerance rtol=1e-05 atol=1e-05 at input[0, 31, 15, 571] (0.05570845305919647 vs. 0.05571943521499634) and 0 other locations (0.00%)
_check_trace([example_inputs], func, executor_options, traced, check_tolerance, _force_outplace, False)
saving model in /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/model.pytorch
I guess it's ok:
l ~/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted
cfg.yaml model.onnx model.pytorch
and:
./infer_video -p ~/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/ -b Pytorch --video ~/0002-20170519-2.mp4
================================================================================
video: /home/ubuntu/0002-20170519-2.mp4
path: /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted//
backend: Pytorch
verbose: 0
================================================================================
Trying to open model
Successfully opened model
Opening video/home/ubuntu/0002-20170519-2.mp4 for prediction.
================================================================================
Predicting frame: 0
================================================================================
================================================================================
Predicting frame: 1
================================================================================
================================================================================
Predicting frame: 2
================================================================================
================================================================================
Predicting frame: 3
How to get results? Where are files saved?
In python version I see log
which is a dir to save output.
And pytorch
instead of Pytorch
as a backend name
./infer_video.py -p ~/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/ -b pytorch --video ~/0002-20170519-2.mp4 -l ~/mapillary_mobilenetsv2_aspp_res_512_os8_34/results/
----------
INTERFACE:
Video /home/ubuntu/0002-20170519-2.mp4
log dir /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/results/
model path /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/
backend pytorch
workspace 1000000000
Verbose False
Mask None
INT8 Calibration Images None
----------
Commit hash: b'5368eed'
----------
model folder exists! Using model from /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/
Opening config file /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted//cfg.yaml
Successfully Pytorch-traced model from /home/ubuntu/mapillary_mobilenetsv2_aspp_res_512_os8_34/converted/model.pytorch
Trying to open video: /home/ubuntu/0002-20170519-2.mp4
Finally, I am able to run it using python. Thank you so much for help!