TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

❓ [Question] How to specific aten operators must be run by LibTorch in C++?

Open demuxin opened this issue 1 year ago • 7 comments

❓ Question

When I compile the SwinTransformer model using Torch-TensorRT, an error appears:

terminate called after throwing an instance of 'c10::Error'
  what():  0 INTERNAL ASSERT FAILED at "../torch/csrc/jit/ir/alias_analysis.cpp":615, please report a bug to PyTorch. We don't have an op for aten::floor_divide but it isn't a special case.  Argument types: int, int, 

Candidates:
        aten::floor_divide(Tensor self, Tensor other) -> Tensor
        aten::floor_divide.Scalar(Tensor self, Scalar other) -> Tensor
        aten::floor_divide.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!)
        aten::floor_divide.Scalar_out(Tensor self, Scalar other, *, Tensor(a!) out) -> Tensor(a!)

I checked out this link, This error is because torch-trt dont support % op.

Fine, I can select to run floor_divide using LibTorch.

torchtrt::ts::CompileSpec compile_settings({ input });
compile_settings.enabled_precisions.insert(build_type);
compile_settings.workspace_size = _1_GB;
compile_settings.truncate_long_and_double = true;
compile_settings.num_avg_timing_iters = 1;
compile_settings.torch_executed_ops.push_back("aten::floor_divide");  // here
torchtrt::ts::compile(model, compile_settings)

It's strange that the setting does not take effect. This error still persists.

What can I do about this mistake?

Furthermore, How to specific aten operators must be run by LibTorch in C++?

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • PyTorch Version (e.g., 1.0):2.2.1
  • CPU Architecture:x86
  • OS (e.g., Linux):ubuntu22.04
  • How you installed PyTorch (conda, pip, libtorch, source):
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version:
  • CUDA version:12.2
  • GPU models and configuration:
  • Any other relevant information:

demuxin avatar May 13 '24 10:05 demuxin

I came up with this solution. I use this code below to replace % op:

def TakeRemainder(x: int, y: int) -> int:
    return x - y * int(x / y)

And it works.

I want to know why this setting doesn't take effect.

compile_settings.torch_executed_ops.push_back("aten::floor_divide"); 

demuxin avatar May 15 '24 02:05 demuxin

Hi - thanks for the report. I think this may be related to the following lowering pass, where it's possible that both inputs are upcasted integers, so we accidentally construct a schema which is no longer valid: https://github.com/pytorch/TensorRT/blob/4b993f8ee30fd02b7ab9cff47114a0538562cf81/core/lowering/passes/remove_unnecessary_casts.cpp#L135-L141

Regarding why compile_settings.torch_executed_ops.push_back("aten::floor_divide"); doesn't work - this is likely because the lowering pass puts the graph in an inconsistent or invalid state, so it doesn't have the opportunity to exclude conversion of floor_divide before failure, since the "lowering" phase happens prior to partitioning and conversion to TRT/Torch.

gs-olive avatar May 15 '24 15:05 gs-olive

Hi - thanks for the report. I think this may be related to the following lowering pass, where it's possible that both inputs are upcasted integers, so we accidentally construct a schema which is no longer valid:

So this is a bug, right? Will you fix this bug in the future?

demuxin avatar May 16 '24 01:05 demuxin

Yes, this appears to be bug and we can work on a fix for this. Do you have a reproducer script or model we could use to recreate the error?

gs-olive avatar May 17 '24 01:05 gs-olive

This is code:

torch::Device* device_ = new torch::Device(torch::DeviceType::CUDA);
device_->set_index(0);

torch::jit::script::Module model = torch::jit::load(model_path);
model.to("cuda");
model.eval();
model.to(torch::kHalf);

std::vector<int64_t> input_dim{1, 3, 832, 1440};
auto input = torchtrt::Input(input_dim, torchtrt::DataType::kHalf);

size_t _1_GB = 1 << 30;
torchtrt::ts::CompileSpec compile_settings({ input });
compile_settings.enabled_precisions.insert(torchtrt::DataType::kHalf);
compile_settings.workspace_size = _1_GB;
compile_settings.truncate_long_and_double = true;
compile_settings.num_avg_timing_iters = 1;
torchtrt::ts::compile(model, compile_settings);

Additionally, I provide you with the model with google dirve.

demuxin avatar May 17 '24 02:05 demuxin

Hello - thanks for the details. I am unable to access the model at that link, is the model available elsewhere? Also, could you provide the full output debug log as well - using the following logging level: torchtrt::logging::set_reportable_log_level(torchtrt::logging::Level::kGRAPH);?

gs-olive avatar May 24 '24 05:05 gs-olive

I changed the access to the model, The model link is accessible.

demuxin avatar May 24 '24 05:05 demuxin