TensorRT
TensorRT copied to clipboard
How to convert model from double to float
When I try to complie torchscript model, I get this log
DEBUG: [TRTorch Conversion Context] - Found IValue containing object of type Double(requires_grad=0, device=cpu)
terminate called after throwing an instance of 'trtorch::Error'
what(): [enforce fail at core/util/trt_util.cpp:293] Expected aten_trt_type_map.find(t) != aten_trt_type_map.end() to be true but got false
Unsupported Aten datatype
So I try to convert model to float using this
script_model = torch.jit.load(path)
script_model = script_model.eval()
script_model = script_model.float()
script_model.save(new_path)
And it still throw this error
can you provide more of the log / the graph?
can you provide more of the log / the graph?
Log file is here double.log
gdb backtrace
Thread 1 "trtorchc" received signal SIGABRT, Aborted.
0x00007fff63987438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007fff63987438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fff6398903a in __GI_abort () at abort.c:89
#2 0x00007ffff7a8ddde in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff7a99896 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff7a99901 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff7a99b55 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x000000000055a628 in trtorch::core::util::toTRTDataType (t=c10::ScalarType::Double) at core/util/trt_util.cpp:293
#7 0x000000000055a714 in trtorch::core::util::toTRTDataType (dtype=...) at core/util/trt_util.cpp:299
#8 0x000000000052f888 in trtorch::core::conversion::converters::Weights::Weights (this=0x7fffffffa300, ctx=0x7fffffffb010, t=...)
at core/conversion/converters/Weights.cpp:76
#9 0x000000000052e40e in trtorch::core::conversion::Var::ITensorOrFreeze (this=0x67243178, ctx=0x7fffffffb010) at core/conversion/var/Var.cpp:99
#10 0x00000000004a5f32 in trtorch::core::conversion::converters::impl::(anonymous namespace)::<lambda(trtorch::core::conversion::ConversionCtx*, const torch::jit::Node*, trtorch::core::conversion::converters::args&)>::operator()(trtorch::core::conversion::ConversionCtx *, const torch::jit::Node *, trtorch::core::conversion::converters::args &) const (__closure=0x7fffffffaae0, ctx=0x7fffffffb010, n=0x5e391a60, args=std::vector of length 2, capacity 2 = {...})
at core/conversion/converters/impl/element_wise.cpp:325
#11 0x00000000004aeb5e in std::_Function_handler<bool(trtorch::core::conversion::ConversionCtx*, const torch::jit::Node*, std::vector<trtorch::core::conversion::Var, std::allocator<trtorch::core::conversion::Var> >&), trtorch::core::conversion::converters::impl::(anonymous namespace)::<lambda(trtorch::core::conversion::ConversionCtx*, const torch::jit::Node*, trtorch::core::conversion::converters::args&)> >::_M_invoke(const std::_Any_data &, <unknown type in /home/vincent/.cache/bazel/_bazel_vincent/6ed89a01738d85eb9ea1a1afec2b71b5/execroot/TRTorch/bazel-out/k8-dbg/bin/cpp/trtorchc/trtorchc, CU 0x939ab0, DIE 0x9ff895>, <unknown type in /home/vincent/.cache/bazel/_bazel_vincent/6ed89a01738d85eb9ea1a1afec2b71b5/execroot/TRTorch/bazel-out/k8-dbg/bin/cpp/trtorchc/trtorchc, CU 0x939ab0, DIE 0x9ff8a4>, std::vector<trtorch::core::conversion::Var, std::allocator<trtorch::core::conversion::Var> > &) (__functor=...,
__args#0=<unknown type in /home/vincent/.cache/bazel/_bazel_vincent/6ed89a01738d85eb9ea1a1afec2b71b5/execroot/TRTorch/bazel-out/k8-dbg/bin/cpp/trtorchc/trtorchc, CU 0x939ab0, DIE 0x9ff895>,
__args#1=<unknown type in /home/vincent/.cache/bazel/_bazel_vincent/6ed89a01738d85eb9ea1a1afec2b71b5/execroot/TRTorch/bazel-out/k8-dbg/bin/cpp/trtorchc/trtorchc, CU 0x939ab0, DIE 0x9ff8a4>, __args#2=std::vector of length 2, capacity 2 = {...}) at /usr/include/c++/7/bits/std_function.h:302
#12 0x000000000047cd02 in std::function<bool (trtorch::core::conversion::ConversionCtx*, torch::jit::Node const*, std::vector<trtorch::core::conversion::Var, std::allocator<trtorch::core::conversion::Var> >&)>::operator()(trtorch::core::conversion::ConversionCtx*, torch::jit::Node const*, std::vector<trtorch::core::conversion::Var, std::allocator<trtorch::core::conversion::Var> >&) const (this=0x7fffffffaae0, __args#0=0x7fffffffb010, __args#1=0x5e391a60, __args#2=std::vector of length 2, capacity 2 = {...})
at /usr/include/c++/7/bits/std_function.h:706
#13 0x00000000004769d8 in trtorch::core::conversion::AddLayer (ctx=0x7fffffffb010, n=0x5e391a60) at core/conversion/conversion.cpp:116
#14 0x000000000047a91b in trtorch::core::conversion::ConvertBlockToNetDef (ctx=0x7fffffffb010, b=0x5e30a510, build_info=..., static_params=std::map with 0 elements)
at core/conversion/conversion.cpp:353
#15 0x000000000047adaf in trtorch::core::conversion::ConvertBlockToEngine[abi:cxx11](torch::jit::Block const*, trtorch::core::conversion::ConversionInfo, std::map<torch::jit::Value*, c10::IValue, std::less<torch::jit::Value*>, std::allocator<std::pair<torch::jit::Value* const, c10::IValue> > >&) (b=0x5e30a510, build_info=...,
static_params=std::map with 0 elements) at core/conversion/conversion.cpp:380
#16 0x000000000045da60 in trtorch::core::ConvertGraphToTRTEngine (mod=..., method_name="forward", cfg=...) at core/compiler.cpp:150
#17 0x000000000045de12 in trtorch::core::CompileGraph (mod=..., cfg=...) at core/compiler.cpp:163
#18 0x000000000045b033 in trtorch::CompileGraph (module=..., info=...) at cpp/api/src/trtorch.cpp:31
#19 0x000000000042196c in main (argc=5, argv=0x7fffffffdf88) at cpp/trtorchc/main.cpp:382
This error is from aten_to_trt_type mapping. The pytorch tensor in your graph has a double datatype and TRT doesn't support beyond float datatype. My guess: Looking at your log, it looks like the second input to the mul node is a double node.
%22 : Tensor = prim::Constant[value={0.2}]()
%523 : Tensor = aten::mul(%x5.1, %22)
Adding Layer %523 : Tensor = aten::mul(%x5.1, %22) # /data00/home/roy.he/projects/super_res/infere/models/modules/RRDBNet_arch.py:28:0 (ctx.AddLayer)
[1;35mDEBUG: [0m[TRTorch Conversion Context] - Node input is an already converted tensor
[1;35mDEBUG: [0m[TRTorch Conversion Context] - Node input is a result of a previously evaluated value
[1;35mDEBUG: [0mFrozen tensor shape: [1, 64, 1000, 1000]
[1;35mDEBUG: [0m[TRTorch Conversion Context] - Found IValue containing object of type Double(requires_grad=0, device=cpu)
terminate called after throwing an instance of 'trtorch::Error'
what(): [enforce fail at core/util/trt_util.cpp:293] Expected aten_trt_type_map.find(t) != aten_trt_type_map.end() to be true but got false
Unsupported Aten datatype
Maybe this %22 is a double value which might be causing the problem. If you have the source code, can you force this constant to be float, regenerate the torchscript and try ? There are other similar constants in your model as follows which are represented as explicit float constants unlike %22 tensor
%13 : float = prim::Constant[value=0.20000000000000001]()
This error is from aten_to_trt_type mapping. The pytorch tensor in your graph has a double datatype and TRT doesn't support beyond float datatype. My guess: Looking at your log, it looks like the second input to the mul node is a double node.
%22 : Tensor = prim::Constant[value={0.2}]() %523 : Tensor = aten::mul(%x5.1, %22)
Adding Layer %523 : Tensor = aten::mul(%x5.1, %22) # /data00/home/roy.he/projects/super_res/infere/models/modules/RRDBNet_arch.py:28:0 (ctx.AddLayer) �[1;35mDEBUG: �[0m[TRTorch Conversion Context] - Node input is an already converted tensor �[1;35mDEBUG: �[0m[TRTorch Conversion Context] - Node input is a result of a previously evaluated value �[1;35mDEBUG: �[0mFrozen tensor shape: [1, 64, 1000, 1000] �[1;35mDEBUG: �[0m[TRTorch Conversion Context] - Found IValue containing object of type Double(requires_grad=0, device=cpu) terminate called after throwing an instance of 'trtorch::Error' what(): [enforce fail at core/util/trt_util.cpp:293] Expected aten_trt_type_map.find(t) != aten_trt_type_map.end() to be true but got false Unsupported Aten datatype
Maybe this %22 is a double value which might be causing the problem. If you have the source code, can you force this constant to be float, regenerate the torchscript and try ? There are other similar constants in your model as follows which are represented as explicit float constants unlike %22 tensor
%13 : float = prim::Constant[value=0.20000000000000001]()
Yes, I tried to print the constant value in the script model using
script_model = torch.jit.load(path)
script_model.eval()
script_model.float()
print(script_model.forward.code_with_constants[1])
the output is
{'c0': tensor(0.2000, dtype=torch.float64), 'c1': tensor(2.)}
I also try to modify the costant value by
script_model.code_with_constants[1].c0 = script_model.code_with_constants[1].c0.float()
but it didn't change the type. So the only way is to modify the orignal python module definition? There this nothing can do to modify the constant type throuhg torchscript?
@inocsin Can you try with torch.fx https://pytorch.org/docs/stable/fx.html#direct-graph-manipulation ? I'm not sure if that would work but we can basically modify layers/add new layers etc. Hopefully we can modify these constant layers to the desired precision ?
This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days
Hi is there any way to convert to float within trt itself ? Because changing every model is not feasible. Or scoping out every line of huge models can be very tedious and non-scalable.
Related issues: inception_v3 pretrained compilation - Unsupported ATen data type Double - https://github.com/pytorch/TensorRT/issues/1096
Fix Inception transform_input to use Float tensors - https://github.com/pytorch/vision/pull/6120
torch_tensorrt does not even support basic inception_v3 model!!! Just because it has the following statement
x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5
Which should be replaced with
x_ch0 = torch.unsqueeze(x[:, 0], 1) * torch.tensor(0.229 / 0.5) + torch.tensor((0.485 - 0.5) / 0.5)
Torchvision Model developers thinks that the fix should be done in torch_tensorrt - the right thing to do is to add a strength reduction pass in torch_tensorrt
@apivovarov Using the following script I am able to run InceptionV3. Please open a new issue if there are still problems, I think the original issue from Jan 2021 has been addressed with truncate_long_and_double
import torch
import torchvision
import torch_tensorrt as torchtrt
model = torch.hub.load('pytorch/vision:v0.13.0', 'inception_v3', pretrained=True)
model = torch.jit.script(model)
model.eval().cuda()
torchtrt.logging.set_reportable_log_level(torchtrt.logging.Level.Graph)
mod = torchtrt.ts.compile(
model,
inputs=[torchtrt.Input((1, 3, 300, 300))],
truncate_long_and_double=True,
torch_executed_ops=[
"prim::TupleConstruct",
]
)
x = torch.randn((1, 3, 300, 300)).cuda()
print(mod(x))
@narendasan Thank you! Can you double check that it works with shape (1,3,299,299)
- which is inception_v3 input shape https://pytorch.org/hub/pytorch_vision_inception_v3/
yes this input size works as well.