TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

How to convert model from double to float

Open inocsin opened this issue 4 years ago • 8 comments

When I try to complie torchscript model, I get this log

DEBUG: [TRTorch Conversion Context] - Found IValue containing object of type Double(requires_grad=0, device=cpu)
terminate called after throwing an instance of 'trtorch::Error'
  what():  [enforce fail at core/util/trt_util.cpp:293] Expected aten_trt_type_map.find(t) != aten_trt_type_map.end() to be true but got false
Unsupported Aten datatype

So I try to convert model to float using this

script_model = torch.jit.load(path)
script_model = script_model.eval()
script_model = script_model.float()
script_model.save(new_path)

And it still throw this error

inocsin avatar Jan 06 '21 09:01 inocsin

can you provide more of the log / the graph?

narendasan avatar Jan 06 '21 20:01 narendasan

can you provide more of the log / the graph?

Log file is here double.log

gdb backtrace

Thread 1 "trtorchc" received signal SIGABRT, Aborted.
0x00007fff63987438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007fff63987438 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fff6398903a in __GI_abort () at abort.c:89
#2  0x00007ffff7a8ddde in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff7a99896 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff7a99901 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff7a99b55 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x000000000055a628 in trtorch::core::util::toTRTDataType (t=c10::ScalarType::Double) at core/util/trt_util.cpp:293
#7  0x000000000055a714 in trtorch::core::util::toTRTDataType (dtype=...) at core/util/trt_util.cpp:299
#8  0x000000000052f888 in trtorch::core::conversion::converters::Weights::Weights (this=0x7fffffffa300, ctx=0x7fffffffb010, t=...)
    at core/conversion/converters/Weights.cpp:76
#9  0x000000000052e40e in trtorch::core::conversion::Var::ITensorOrFreeze (this=0x67243178, ctx=0x7fffffffb010) at core/conversion/var/Var.cpp:99
#10 0x00000000004a5f32 in trtorch::core::conversion::converters::impl::(anonymous namespace)::<lambda(trtorch::core::conversion::ConversionCtx*, const torch::jit::Node*, trtorch::core::conversion::converters::args&)>::operator()(trtorch::core::conversion::ConversionCtx *, const torch::jit::Node *, trtorch::core::conversion::converters::args &) const (__closure=0x7fffffffaae0, ctx=0x7fffffffb010, n=0x5e391a60, args=std::vector of length 2, capacity 2 = {...})
    at core/conversion/converters/impl/element_wise.cpp:325
#11 0x00000000004aeb5e in std::_Function_handler<bool(trtorch::core::conversion::ConversionCtx*, const torch::jit::Node*, std::vector<trtorch::core::conversion::Var, std::allocator<trtorch::core::conversion::Var> >&), trtorch::core::conversion::converters::impl::(anonymous namespace)::<lambda(trtorch::core::conversion::ConversionCtx*, const torch::jit::Node*, trtorch::core::conversion::converters::args&)> >::_M_invoke(const std::_Any_data &, <unknown type in /home/vincent/.cache/bazel/_bazel_vincent/6ed89a01738d85eb9ea1a1afec2b71b5/execroot/TRTorch/bazel-out/k8-dbg/bin/cpp/trtorchc/trtorchc, CU 0x939ab0, DIE 0x9ff895>, <unknown type in /home/vincent/.cache/bazel/_bazel_vincent/6ed89a01738d85eb9ea1a1afec2b71b5/execroot/TRTorch/bazel-out/k8-dbg/bin/cpp/trtorchc/trtorchc, CU 0x939ab0, DIE 0x9ff8a4>, std::vector<trtorch::core::conversion::Var, std::allocator<trtorch::core::conversion::Var> > &) (__functor=...,
    __args#0=<unknown type in /home/vincent/.cache/bazel/_bazel_vincent/6ed89a01738d85eb9ea1a1afec2b71b5/execroot/TRTorch/bazel-out/k8-dbg/bin/cpp/trtorchc/trtorchc, CU 0x939ab0, DIE 0x9ff895>,
    __args#1=<unknown type in /home/vincent/.cache/bazel/_bazel_vincent/6ed89a01738d85eb9ea1a1afec2b71b5/execroot/TRTorch/bazel-out/k8-dbg/bin/cpp/trtorchc/trtorchc, CU 0x939ab0, DIE 0x9ff8a4>, __args#2=std::vector of length 2, capacity 2 = {...}) at /usr/include/c++/7/bits/std_function.h:302
#12 0x000000000047cd02 in std::function<bool (trtorch::core::conversion::ConversionCtx*, torch::jit::Node const*, std::vector<trtorch::core::conversion::Var, std::allocator<trtorch::core::conversion::Var> >&)>::operator()(trtorch::core::conversion::ConversionCtx*, torch::jit::Node const*, std::vector<trtorch::core::conversion::Var, std::allocator<trtorch::core::conversion::Var> >&) const (this=0x7fffffffaae0, __args#0=0x7fffffffb010, __args#1=0x5e391a60, __args#2=std::vector of length 2, capacity 2 = {...})
    at /usr/include/c++/7/bits/std_function.h:706
#13 0x00000000004769d8 in trtorch::core::conversion::AddLayer (ctx=0x7fffffffb010, n=0x5e391a60) at core/conversion/conversion.cpp:116
#14 0x000000000047a91b in trtorch::core::conversion::ConvertBlockToNetDef (ctx=0x7fffffffb010, b=0x5e30a510, build_info=..., static_params=std::map with 0 elements)
    at core/conversion/conversion.cpp:353
#15 0x000000000047adaf in trtorch::core::conversion::ConvertBlockToEngine[abi:cxx11](torch::jit::Block const*, trtorch::core::conversion::ConversionInfo, std::map<torch::jit::Value*, c10::IValue, std::less<torch::jit::Value*>, std::allocator<std::pair<torch::jit::Value* const, c10::IValue> > >&) (b=0x5e30a510, build_info=...,
    static_params=std::map with 0 elements) at core/conversion/conversion.cpp:380
#16 0x000000000045da60 in trtorch::core::ConvertGraphToTRTEngine (mod=..., method_name="forward", cfg=...) at core/compiler.cpp:150
#17 0x000000000045de12 in trtorch::core::CompileGraph (mod=..., cfg=...) at core/compiler.cpp:163
#18 0x000000000045b033 in trtorch::CompileGraph (module=..., info=...) at cpp/api/src/trtorch.cpp:31
#19 0x000000000042196c in main (argc=5, argv=0x7fffffffdf88) at cpp/trtorchc/main.cpp:382

inocsin avatar Jan 07 '21 02:01 inocsin

This error is from aten_to_trt_type mapping. The pytorch tensor in your graph has a double datatype and TRT doesn't support beyond float datatype. My guess: Looking at your log, it looks like the second input to the mul node is a double node.

%22 : Tensor = prim::Constant[value={0.2}]()
%523 : Tensor = aten::mul(%x5.1, %22)
Adding Layer %523 : Tensor = aten::mul(%x5.1, %22) # /data00/home/roy.he/projects/super_res/infere/models/modules/RRDBNet_arch.py:28:0 (ctx.AddLayer)
[1;35mDEBUG: [0m[TRTorch Conversion Context] - Node input is an already converted tensor
[1;35mDEBUG: [0m[TRTorch Conversion Context] - Node input is a result of a previously evaluated value
[1;35mDEBUG: [0mFrozen tensor shape: [1, 64, 1000, 1000]
[1;35mDEBUG: [0m[TRTorch Conversion Context] - Found IValue containing object of type Double(requires_grad=0, device=cpu)
terminate called after throwing an instance of 'trtorch::Error'
  what():  [enforce fail at core/util/trt_util.cpp:293] Expected aten_trt_type_map.find(t) != aten_trt_type_map.end() to be true but got false
Unsupported Aten datatype

Maybe this %22 is a double value which might be causing the problem. If you have the source code, can you force this constant to be float, regenerate the torchscript and try ? There are other similar constants in your model as follows which are represented as explicit float constants unlike %22 tensor

 %13 : float = prim::Constant[value=0.20000000000000001]()

peri044 avatar Jan 07 '21 19:01 peri044

This error is from aten_to_trt_type mapping. The pytorch tensor in your graph has a double datatype and TRT doesn't support beyond float datatype. My guess: Looking at your log, it looks like the second input to the mul node is a double node.

%22 : Tensor = prim::Constant[value={0.2}]()
%523 : Tensor = aten::mul(%x5.1, %22)
Adding Layer %523 : Tensor = aten::mul(%x5.1, %22) # /data00/home/roy.he/projects/super_res/infere/models/modules/RRDBNet_arch.py:28:0 (ctx.AddLayer)
�[1;35mDEBUG: �[0m[TRTorch Conversion Context] - Node input is an already converted tensor
�[1;35mDEBUG: �[0m[TRTorch Conversion Context] - Node input is a result of a previously evaluated value
�[1;35mDEBUG: �[0mFrozen tensor shape: [1, 64, 1000, 1000]
�[1;35mDEBUG: �[0m[TRTorch Conversion Context] - Found IValue containing object of type Double(requires_grad=0, device=cpu)
terminate called after throwing an instance of 'trtorch::Error'
  what():  [enforce fail at core/util/trt_util.cpp:293] Expected aten_trt_type_map.find(t) != aten_trt_type_map.end() to be true but got false
Unsupported Aten datatype

Maybe this %22 is a double value which might be causing the problem. If you have the source code, can you force this constant to be float, regenerate the torchscript and try ? There are other similar constants in your model as follows which are represented as explicit float constants unlike %22 tensor

 %13 : float = prim::Constant[value=0.20000000000000001]()

Yes, I tried to print the constant value in the script model using

script_model = torch.jit.load(path)
script_model.eval()
script_model.float()
print(script_model.forward.code_with_constants[1])

the output is

{'c0': tensor(0.2000, dtype=torch.float64), 'c1': tensor(2.)}

I also try to modify the costant value by

script_model.code_with_constants[1].c0 = script_model.code_with_constants[1].c0.float()

but it didn't change the type. So the only way is to modify the orignal python module definition? There this nothing can do to modify the constant type throuhg torchscript?

inocsin avatar Jan 08 '21 09:01 inocsin

@inocsin Can you try with torch.fx https://pytorch.org/docs/stable/fx.html#direct-graph-manipulation ? I'm not sure if that would work but we can basically modify layers/add new layers etc. Hopefully we can modify these constant layers to the desired precision ?

peri044 avatar Apr 08 '21 08:04 peri044

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

github-actions[bot] avatar Jul 08 '21 00:07 github-actions[bot]

Hi is there any way to convert to float within trt itself ? Because changing every model is not feasible. Or scoping out every line of huge models can be very tedious and non-scalable.

codeislife99 avatar Jun 02 '22 01:06 codeislife99

Related issues: inception_v3 pretrained compilation - Unsupported ATen data type Double - https://github.com/pytorch/TensorRT/issues/1096

Fix Inception transform_input to use Float tensors - https://github.com/pytorch/vision/pull/6120

torch_tensorrt does not even support basic inception_v3 model!!! Just because it has the following statement

x_ch0 = torch.unsqueeze(x[:, 0], 1) * (0.229 / 0.5) + (0.485 - 0.5) / 0.5

Which should be replaced with

x_ch0 = torch.unsqueeze(x[:, 0], 1) * torch.tensor(0.229 / 0.5) + torch.tensor((0.485 - 0.5) / 0.5)

Torchvision Model developers thinks that the fix should be done in torch_tensorrt - the right thing to do is to add a strength reduction pass in torch_tensorrt

apivovarov avatar Jun 02 '22 01:06 apivovarov

@apivovarov Using the following script I am able to run InceptionV3. Please open a new issue if there are still problems, I think the original issue from Jan 2021 has been addressed with truncate_long_and_double

import torch
import torchvision
import torch_tensorrt as torchtrt

model = torch.hub.load('pytorch/vision:v0.13.0', 'inception_v3', pretrained=True)
model = torch.jit.script(model)
model.eval().cuda()


torchtrt.logging.set_reportable_log_level(torchtrt.logging.Level.Graph)

mod = torchtrt.ts.compile(
    model,
    inputs=[torchtrt.Input((1, 3, 300, 300))],
    truncate_long_and_double=True,
    torch_executed_ops=[
        "prim::TupleConstruct",
    ]
)

x = torch.randn((1, 3, 300, 300)).cuda()

print(mod(x))

narendasan avatar Aug 12 '22 16:08 narendasan

@narendasan Thank you! Can you double check that it works with shape (1,3,299,299) - which is inception_v3 input shape https://pytorch.org/hub/pytorch_vision_inception_v3/

apivovarov avatar Aug 12 '22 19:08 apivovarov

yes this input size works as well.

narendasan avatar Aug 12 '22 21:08 narendasan