TensorRT
TensorRT copied to clipboard
🐛 [Bug] A Segmentation fault occurs when torchtrt::ts::compile using Torch-TensorRT
Bug Description
When I use the code below to compile the touchscript model, a segmentation fault occurs.
I compiled the Torch-TensorRT source code with debug mode, and ran the program using GDB.
I found out this error appears on line 100. https://github.com/pytorch/TensorRT/blob/4b993f8ee30fd02b7ab9cff47114a0538562cf81/core/partitioning/shape_analysis.cpp#L222
Then I continued to debug, I found out seg_block.raw_inputs() on line 182 is std::vector of length 0, it lead to jit_inputs_ivalues on line 222 is also std::vector of length 0.
https://github.com/pytorch/TensorRT/blob/4b993f8ee30fd02b7ab9cff47114a0538562cf81/core/partitioning/shape_analysis.cpp#L179-L222
This is my simplified version of the code.
torch::Device* device_ = new torch::Device(torch::DeviceType::CUDA);
device_->set_index(0);
torch::jit::script::Module model = torch::jit::load(model_path);
model.to("cuda");
model.eval();
model.to(torch::kHalf);
std::vector<int64_t> input_dim{1, 3, 832, 1440};
auto input = torchtrt::Input(input_dim, torchtrt::DataType::kHalf);
size_t _1_GB = 1 << 30;
torchtrt::ts::CompileSpec compile_settings({ input });
compile_settings.enabled_precisions.insert(torchtrt::DataType::kHalf);
compile_settings.workspace_size = _1_GB;
compile_settings.truncate_long_and_double = true;
compile_settings.num_avg_timing_iters = 1;
torchtrt::ts::compile(model, compile_settings);
And I can share model to you to debug this error.
This is stack traces:
#0 0x00007fffe3752699 in torch::jit::InterpreterStateImpl::callstack() const () at /usr/local/libtorch/lib/libtorch_cpu.so
#1 0x00007fffe375537c in torch::jit::InterpreterStateImpl::handleError(std::exception const&, bool, c10::NotImplementedError*, std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >) () at /usr/local/libtorch/lib/libtorch_cpu.so
#2 0x00007fffe3763fc4 in torch::jit::InterpreterStateImpl::runImpl(std::vector<c10::IValue, std::allocator<c10::IValue> >&) ()
at /usr/local/libtorch/lib/libtorch_cpu.so
#3 0x00007fffe374d156 in torch::jit::InterpreterState::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) ()
at /usr/local/libtorch/lib/libtorch_cpu.so
#4 0x00007fffe373e2c8 in torch::jit::GraphExecutorImplBase::run(std::vector<c10::IValue, std::allocator<c10::IValue> >&) ()
at /usr/local/libtorch/lib/libtorch_cpu.so
#5 0x00007fffe338e1b9 in torch::jit::Method::operator()(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&) const () at /usr/local/libtorch/lib/libtorch_cpu.so
#6 0x00007fff49a0b97e in torch::jit::Module::forward(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, c10::IValue, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, c10::IValue> > > const&)
(this=0x7fff3d950d30, inputs=std::vector of length 0, capacity 0, kwargs=std::unordered_map with 0 elements)
at /usr/local/libtorch/include/torch/csrc/jit/api/module.h:116
#7 0x00007fff49a06589 in torch_tensorrt::core::partitioning::getSegmentsOutputByRunning(torch_tensorrt::core::partitioning::SegmentedBlock&, std::unordered_map<torch::jit::Value const*, c10::IValue, std::hash<torch::jit::Value const*>, std::equal_to<torch::jit::Value const*>, std::allocator<std::pair<torch::jit::Value const* const, c10::IValue> > >&, torch_tensorrt::core::partitioning::PartitioningInfo const&, torch_tensorrt::core::ir::ShapeMode const&)
(seg_block=..., ivalues_maps=std::unordered_map with 340 elements = {...}, partitioning_info=..., shape_mode=@0x7fff3d9512ac: torch_tensorrt::core::ir::ShapeMode::kOPT) at /workspace/Torch-TensorRT/core/partitioning/shape_analysis.cpp:222
#8 0x00007fff49a08627 in torch_tensorrt::core::partitioning::runShapeAnalysis(torch_tensorrt::core::partitioning::PartitioningCtx*, torch::jit::Block*, std::unordered_map<torch::jit::Value const*, c10::IValue, std::hash<torch::jit::Value const*>, std::equal_to<torch::jit::Value const*>, std::allocator<std::pair<torch::jit::Value const* const, c10::IValue> > >&, torch_tensorrt::core::ir::ShapeMode const&)
(ctx=0x7fff3d951860, block=0x7ffe81ad67b0, example_tensor_map=std::unordered_map with 340 elements = {...}, shape_mode=@0x7fff3d9512ac: torch_tensorrt::core::ir::ShapeMode::kOPT) at /workspace/Torch-TensorRT/core/partitioning/shape_analysis.cpp:354
#9 0x00007fff499ed4b5 in torch_tensorrt::core::partitioning::partition(torch_tensorrt::core::partitioning::PartitioningCtx*, bool)
(ctx=0x7fff3d951860, expect_full_compilation=false) at /workspace/Torch-TensorRT/core/partitioning/partitioning.cpp:607
let me add something else. this is the value of seg_block:
(torch_tensorrt::core::partitioning::SegmentedBlock &) @0x7ffe820abff0: {id_ = 182,
target_ = torch_tensorrt::core::partitioning::SegmentedBlock::kTensorRT, min_shapes_ = std::vector of length 0, capacity 0,
opt_shapes_ = std::vector of length 0, capacity 0, max_shapes_ = std::vector of length 0, capacity 0,
in_types_ = std::vector of length 0, capacity 0, inputs_ = std::vector of length 0, capacity 0, outputs_ = std::vector of length 12, capacity 16 = {
0x7fff1ece1be0, 0x7ffe9226bd50, 0x7ffe807fdda0, 0x7ffe9353ff00, 0x7ffe93ee3ed0, 0x7ffe906eec30, 0x7ffe83e18b00, 0x7ffe93eba2b0, 0x7ffe92c758e0,
0x7ffe82c9c050, 0x7ffe819ea960, 0x7ffe93325580}, nodes_ = std::vector of length 44, capacity 44 = {0x7ffe83e7ae80, 0x7ffe83528ef0, 0x7ffe82c827f0,
0x7ffe82ce1010, 0x7ffe910037c0, 0x7ffe9071f7f0, 0x7ffe920cd4e0, 0x7ffe93518a00, 0x7ffe71722010, 0x7ffe82c66960, 0x7ffe932441f0, 0x7ffe9056f330,
0x7fff1f39f820, 0x7ffe921e7c00, 0x7ffe71964950, 0x7ffe9319aa60, 0x7ffe923da820, 0x7ffe71739210, 0x7ffe81198fc0, 0x7ffe923bc340, 0x7ffe9088eff0,
0x7ffe9172bb60, 0x7ffe92cde400, 0x7fff1ebfc690, 0x7fff1e2cd500, 0x7ffe923d89f0, 0x7ffe708e9e90, 0x7ffe82c25930, 0x7ffe90ef4430, 0x7fff1f710d40,
0x7fff1ee9e5f0, 0x7ffe93526720, 0x7ffe707150d0, 0x7ffe904c4720, 0x7ffe80e4b9d0, 0x7ffe706574f0, 0x7ffe92c3c0f0, 0x7fff1f014510, 0x7fff1ec493a0,
0x7ffe93e64310, 0x7ffe9293c660, 0x7ffe93e01ef0, 0x7ffe90f889f0, 0x7ffe835e4060},
g_ = std::shared_ptr<torch::jit::Graph> (use count 2, weak count 1) = {get() = 0x7ffe91987fb0}, old_to_new_ = std::unordered_map with 67 elements = {
[0x7ffe819ea960] = 0x7ffe80f0f130, [0x7ffe91aa6730] = 0x7ffe80f0eb40, [0x7ffe93e0f810] = 0x7ffe80f0e8a0, [0x7ffe717a7760] = 0x7ffe80f0e600,
[0x7fff1e0a6b50] = 0x7fff1e0b7490, [0x7ffe83e18cf0] = 0x7ffe929a7f40, [0x7ffe9226bd50] = 0x7ffe91989290, [0x7ffe92356dc0] = 0x7ffe833bcc90,
[0x7ffe920e29b0] = 0x7ffe929a7a70, [0x7ffe83ef4dc0] = 0x7ffe929a7870, [0x7ffe93347150] = 0x7ffe929a9220, [0x7ffe83e18b00] = 0x7ffe929a70b0,
[0x7ffe93eba2b0] = 0x7ffe929a7370, [0x7ffe906eec30] = 0x7ffe929a6e50, [0x7ffe910046c0] = 0x7ffe833bdd50, [0x7ffe82c66aa0] = 0x7ffe9224fe40,
[0x7ffe906ef280] = 0x7ffe92251570, [0x7ffe92c758e0] = 0x7ffe833bc4f0, [0x7ffe807fdda0] = 0x7ffe9224f130, [0x7fff1ecfcf80] = 0x7ffe833be110,
[0x7ffe906ef620] = 0x7ffe92251050, [0x7ffe93201590] = 0x7ffe922512b0, [0x7ffe923ebd60] = 0x7ffe80f0edf0, [0x7fff1edfc460] = 0x7ffe92250a00,
[0x7ffe8233ff10] = 0x7ffe9224fa80, [0x7ffe83529030] = 0x7ffe91988980, [0x7ffe82c9c050] = 0x7fff1e0b6310, [0x7ffe81a4e2f0] = 0x7ffe92250da0,
[0x7ffe83ef4e40] = 0x7ffe929a75f0, [0x7ffe82c82930] = 0x7ffe91988bd0, [0x7ffe92322910] = 0x7ffe833be7f0, [0x7ffe82ce1150] = 0x7ffe91988e10,
[0x7fff1ece1be0] = 0x7ffe92250080, [0x7ffe91003900] = 0x7ffe91989050, [0x7ffe9207db00] = 0x7ffe92250790, [0x7ffe93ee3ed0] = 0x7ffe9224f7c0,
[0x7ffe83ef4ba0] = 0x7ffe92250300, [0x7ffe81a4e580] = 0x7ffe833bd520, [0x7ffe806d50a0] = 0x7ffe919895a0, [0x7ffe9353ff00] = 0x7ffe9224f3d0,
[0x7ffe92228280] = 0x7ffe929a9440, [0x7ffe9353ef90] = 0x7ffe929a8040, [0x7ffe923acbe0] = 0x7ffe929a8320, [0x7ffe81a4ea10] = 0x7ffe929a8b90,
[0x7ffe93318a70] = 0x7ffe929a8970, [0x7ffe83e7afc0] = 0x7ffe91988660, [0x7fff1e2cd780] = 0x7ffe833bc250, [0x7ffe93325580] = 0x7ffe833bbfd0,
[0x7ffe719cf920] = 0x7ffe833bc750, [0x7ffe706f3570] = 0x7fff1e0b8510, [0x7ffe910467b0] = 0x7fff1e0b7a70, [0x7ffe93296510] = 0x7ffe833bd040,
[0x7ffe9102d3d0] = 0x7ffe929a7970, [0x7ffe835771b0] = 0x7ffe833bca10, [0x7fff1eb11620] = 0x7fff1e0b65b0, [0x7ffe906ee800] = 0x7fff1e0b67d0,
[0x7ffe7084f8a0] = 0x7ffe929a7cd0, [0x7ffe80613040] = 0x7ffe833bd300, [0x7ffe806d7b50] = 0x7ffe929a8fa0, [0x7ffe932a5e00] = 0x7ffe833bd8a0,
[0x7ffe91aa7310] = 0x7ffe833be660, [0x7ffe91f8b4a0] = 0x7ffe833bdb30, [0x7fff1dc708a0] = 0x7ffe833be390, [0x7ffe92319a20] = 0x7fff1e0b5f90,
[0x7ffe932d9910] = 0x7fff1e0b6be0, [0x7ffe906eedc0] = 0x7fff1e0b7080, [0x7fff1e3c1110] = 0x7fff1e0b6e60}, do_not_merge_ = false}
Environment
Build information about Torch-TensorRT can be found by turning on debug messages
- Torch-TensorRT Version (e.g. 1.0.0):latest source code compiled
- PyTorch Version (e.g. 1.0):2.2.1
- CPU Architecture:x86
- OS (e.g., Linux):ubuntu22.04
- How you installed PyTorch (
conda,pip,libtorch, source): - Build command you used (if compiling from source):
- Are you using local sources or building from archives:
- Python version:
- CUDA version:12.2
- GPU models and configuration:
- Any other relevant information:
Hi @bowang007, any progress on this issue?
@narendasan , is this issue being resolved?
Hi @demuxin looks like after partitioning, there is no input for 1 segmented block. Could you please try printing that segmented part and check why there is no input? If there is no input for the graph, then it is fine that the input vector's size is 0. If not, then I guess there might be some mapping issues for that segmented block's input. Thanks!