autoware.universe icon indicating copy to clipboard operation
autoware.universe copied to clipboard

Run Lidar Centerpoint with TVM

Open mitsudome-r opened this issue 2 years ago • 25 comments

Checklist

  • [X] I've read the contribution guidelines.
  • [X] I've searched other issues and no duplicate issues were found.
  • [X] I've agreed with the maintainers that I can plan this task.

Description

There are packages in Universe that use TensorRT as neural network library. We would like to document a way to convert TensorRT based model into TVM model. We will start investigation using lidar_centerpoint package first. https://github.com/autowarefoundation/autoware.universe/tree/main/perception/lidar_centerpoint

Purpose

Investigate a way to use TVM.

Possible approaches

TBD

Definition of done

  • [ ] Run lidar_centerpoint using TVM library
  • [ ] Write the document for the procedure to convert TensorRT based module to TVM based module

mitsudome-r avatar May 16 '22 08:05 mitsudome-r

Here's a summary of what I learned about TVM (thanks @ambroise-arm for your help, please let me know if there's anything incorrect):

  • TVM aims to be backend agnostic when developing, but once it gets deployed it targets the desired backend
  • The workflow is as follows:
    • Parse the ONNX files via ARM's tvm_cli.py tool (https://github.com/autowarefoundation/modelzoo/blob/master/scripts/tvm_cli/tvm_cli.py), which will tune the model using AutoTVM.
    • The script will generate several files:
      • A network module in the form of a shared library (deploy_lib.so)
      • A JSON file describing the network graph (deploy_graph.json)
      • A file containing the parameters of the network (deploy_param.params)
      • A header file with the configuration of the inference engine (inference_engine_tvm_config.hpp)
  • In the C++ load the deploy_lib.so via TVM's dload mechanism, this enables the code to be platform agnostic while being performant at the same time
  • TVM allows developers to use backend-specific code (e.g. CUDA), though the goal is to abstract that entirely

Notes

  • The TVM team is working on their own CLI tool called TVMC (https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html), which would supersede tvm_cli.py
  • ARM ported Autoware.AI's lidar_point_pillars package from TensorRT to TVM, this can be used as a reference on how to migrate to TVM. It currently contains backend-specific code through CUDA. Additionally, both TensorRT and TVM are supported and one is chosen at build time (TVM is the default).

esteve avatar May 23 '22 16:05 esteve

@Sharrrrk knows a candidate to work on this. @Sharrrrk, Can you tell him/her to follow up on this issue?

mitsudome-r avatar May 24 '22 07:05 mitsudome-r

I have some problems when compiling centerpoint with TVM, both for "llvm" backend and "cuda" backend. I'm not sure if the problem is caused by TVM or improper configuration file settings. Taking "pts_backbone_neck_head_default.onnx" model as example, describe the setting and problems follows.

  1. the "definitation.yaml" of centerpoint centerpoint-backbone-model-struct centerpoint-backbone-definitation

  2. the problems encountered a. for cup backend: centerpoint-backbone-cpu-error

b. for cuda backend: centerpoint-backbone-cuda-error

Looking forward to your advice

liuzf1988 avatar Jun 17 '22 02:06 liuzf1988

Your configuration file looks correct. I think the issue comes from TVM not having good support for dynamic parameters.

I'm not sure what is the exact effect of specifying "-1" as shape parameter. I've tried to let TVM handle the shape automatically by making the following change in ModelZoo and then building the docker image locally:

diff --git a/scripts/tvm_cli/tvm_cli.py b/scripts/tvm_cli/tvm_cli.py
index 7d5fa54..fec132a 100755
--- a/scripts/tvm_cli/tvm_cli.py
+++ b/scripts/tvm_cli/tvm_cli.py
@@ -75 +75 @@ def get_network(info):
-        mod, params = relay.frontend.from_onnx(onnx_model, info['input_dict'])
+        mod, params = relay.frontend.from_onnx(onnx_model)

Testing for llvm the error becomes RuntimeError: Invalid type of axis: <class 'tvm.tir.expr.Add'>, which gets referenced here: https://discuss.tvm.apache.org/t/how-could-us-use-tvm-relay-transform-tomixedprecision/10465/14 And testing for cuda I get RuntimeError: cuda winograd conv2d doesn't support dynamic input height or width..

ambroise-arm avatar Jun 17 '22 14:06 ambroise-arm

Your configuration file looks correct. I think the issue comes from TVM not having good support for dynamic parameters.

I'm not sure what is the exact effect of specifying "-1" as shape parameter. I've tried to let TVM handle the shape automatically by making the following change in ModelZoo and then building the docker image locally:

diff --git a/scripts/tvm_cli/tvm_cli.py b/scripts/tvm_cli/tvm_cli.py
index 7d5fa54..fec132a 100755
--- a/scripts/tvm_cli/tvm_cli.py
+++ b/scripts/tvm_cli/tvm_cli.py
@@ -75 +75 @@ def get_network(info):
-        mod, params = relay.frontend.from_onnx(onnx_model, info['input_dict'])
+        mod, params = relay.frontend.from_onnx(onnx_model)

Testing for llvm the error becomes RuntimeError: Invalid type of axis: <class 'tvm.tir.expr.Add'>, which gets referenced here: https://discuss.tvm.apache.org/t/how-could-us-use-tvm-relay-transform-tomixedprecision/10465/14 And testing for cuda I get RuntimeError: cuda winograd conv2d doesn't support dynamic input height or width..

Recently I'm trying to use the dev version of tvm (i.e. the main branch of tvm) instead of the stable 0.8 version, which seems to solve the RuntimeError: Invalid type of axis: <class 'tvm.tir.expr.Add'> error. However the RuntimeError: cuda winograd conv2d doesn't support dynamic input height or width. error remains.

liuzf1988 avatar Jul 05 '22 05:07 liuzf1988

If we don't have enough engineers who can support TVM packages, then I don't think we can proceed with the task. If there is no progress for another week, then we might want to consider using other ML Inference library. (past discussion about using TVM)

mitsudome-r avatar Jul 05 '22 08:07 mitsudome-r

Although it seems we can't compile this model with dynamic parameters with the current state of TVM, I can compile it just fine with fixed parameters. The code for LidarCenterpoint in this repository already hardcodes "batch_size" to 1. And the default values for range_max_*_, range_min_*_ and voxel_size_*_ give an "H" and "W" value of 560.

By replacing the "-1" with those values in "definition.yaml", the model compiles without errors.

@liuzf1988 I see that you are trying to target CUDA as a backend, but support for CUDA is not enabled in autoware/model-zoo-tvm-cli:latest. We have autoware/model-zoo-tvm-cli:latest-cuda for that, but it's not really tested; and even if it compiles we don't have a prebuilt TVM runtime that supports CUDA for the Autoware side. I suggest targeting Vulkan instead if you want GPU acceleration.

ambroise-arm avatar Jul 05 '22 12:07 ambroise-arm

@mitsudome-r So good news: I was wrong during the ASWG by saying it was a model issue. If we don't plan on changing the width and height parameters, or if we can keep a small number of possible values, then we can probably go ahead with TVM and fixed parameters.

ambroise-arm avatar Jul 05 '22 13:07 ambroise-arm

Although it seems we can't compile this model with dynamic parameters with the current state of TVM, I can compile it just fine with fixed parameters. The code for LidarCenterpoint in this repository already hardcodes "batch_size" to 1. And the default values for range_max_*_, range_min_*_ and voxel_size_*_ give an "H" and "W" value of 560.

By replacing the "-1" with those values in "definition.yaml", the model compiles without errors.

@liuzf1988 I see that you are trying to target CUDA as a backend, but support for CUDA is not enabled in autoware/model-zoo-tvm-cli:latest. We have autoware/model-zoo-tvm-cli:latest-cuda for that, but it's not really tested; and even if it compiles we don't have a prebuilt TVM runtime that supports CUDA for the Autoware side. I suggest targeting Vulkan instead if you want GPU acceleration.

Yes, I totally agree with you to compile LidarCenterpoint model using default value for "batch_size", "H" and "W" at current. And TVM's Relax architecture (i.e. relay next) which supports dynamic shape workloads is under development.

liuzf1988 avatar Jul 07 '22 03:07 liuzf1988

@liuzf1988 @ambroise-arm Thanks for the update. @liuzf1988 Could you try with your proposed approach to use the default values for the model input?

mitsudome-r avatar Jul 12 '22 16:07 mitsudome-r

@liuzf1988 Are there any updates on this task?

xmfcx avatar Jul 26 '22 16:07 xmfcx

@liuzf1988 Are there any updates on this task?

@xmfcx So busy these days, sorry for not being able to reply to the message in time. If there is no misunderstanding, we need to write a document for converting raw Lidar Centerpoint model to TVM based modules. Apart from the document, the code for compiling the raw model will also be provided. I'll try to get these done by this weekend.

liuzf1988 avatar Aug 01 '22 02:08 liuzf1988

@ambroise-arm Referring to the official TVM documentation and the existing script (tvm_cli.py) of autoware model zoo, I wrote the independent script for compiling the neuron network model. However, there are some issues that may need to confirm with you:

  • Which files need to be uploaded to the cloud

In the original script, i.e.,tvm_cli.py, the command used to compile the model is: graph, lib, params = relay.build(mod, target=info['target'], params=params), However, for the current version of tvm-0.10.dev0, the origin compilation command will be deprecated and modified to: lib=relay.build(mod, target=info['target'], params=params)

So we originally had to upload three output files, "deploy_lib.so", "deploy_param.params", and "deploy_graph.json", but now only the "deploy_lib.so" file need to upload. In addition, where should the compiled module file (deploy_lib.so) be uploaded, should it be uploaded to Amazon S3 Bucket as previous? And compiled module files for which backends need to provided (llvm and vulkan, right) ?

  • How to import centerpoint_tvm package to Autoware.universe

I see that you propose a pull request (PR-1181) to port the lidar_apollo_segmentation_tvm packet from Autoware.Auto to Autoware.Universe. Do I need to write a similar package named lidar_centerpoint_tvm to port centerpoint module in the same way? If so, it needs some extra time , and this work is in progress.

liuzf1988 avatar Aug 15 '22 02:08 liuzf1988

@liuzf1988

Thanks for your update.

I wrote the independent script for compiling the neuron network model.

If the current script cannot be used as is, do you think it can be modified to accommodate for the compilation of Lidar Centerpoint? That would be preferable to having standalone scripts per model.

Files

For the distribution of the neural network, the idea would be to use the ModelZoo repository for that. The onnx neural network can be committed to the repository (with the relevant definition.yaml file, and possible modifications to the script to have it compile). That way the network will start to get compiled and uploaded automatically to the S3 bucket as part of the ModelZoo CI (currently for llvm and vulkan backends). (see diagrams of the architecture)

Currently ModelZoo uses TVM 0.8 to compile the models. And the tvm_vendor package that provides the runtime capability to Autoware is also on version 0.8 at the moment. If there is a need to update to TVM 0.9 for Lidar Centerpoint to work, that can be scheduled. But we won't be using a development version, so no 0.10 until it is released.

Package

Yes, I think it is important to have a dedicated package for lidar_centerpoint_tvm. The tvm_utility package provides a pipeline.hpp header so that all packages that make use of TVM in Autoware can use a common interface both for downloading the compiled models and for the inference. tvm_utility has a yolo_v2_tiny test case as an example. And lidar_apollo_segmentation_tvm acts as an example for a standalone package. I think it would be good for lidar_centerpoint_tvm to also use the interface provided by tvm_utility. That way, when the change in artifacts structure you mentioned with TVM 0.10 happens, it can be taken care of on the Autoware side by only changing tvm_utility instead of the individual packages that make use of it.

ambroise-arm avatar Aug 15 '22 10:08 ambroise-arm

@liuzf1988 we are organising a meeting to discuss a way forward for TVM in Autoware. If you are interested in joining us please confirm your email address. Thank you.

BonoloAWF avatar Aug 25 '22 14:08 BonoloAWF

We had a discussion among @ambroise-arm, @xmfcx , @BonoloAWF , @angry-crab last week on this topic. See this comment for the summary.

mitsudome-r avatar Sep 06 '22 16:09 mitsudome-r

@liuzf1988 Do you have any progress with this issue? @angry-crab might be able to help you if there are any other blocking items.

mitsudome-r avatar Sep 13 '22 08:09 mitsudome-r

@liuzf1988 Hi, it would be great if you could share the definition.yaml files so that I'll take over the rest. Thanks.

angry-crab avatar Sep 13 '22 08:09 angry-crab

@ambroise-arm Please correct me if I'm wrong. To sum up, based on the discussion above, we need to do the followings:

  1. fixed input and output size of the network.
  2. push onnx files along with definition.yaml to model_zoo repo.
  3. edit the cli scipt if necessary.

Also, I was wondering since tvm_utility is merged, is cuda backend ready to be tested? Thanks.

angry-crab avatar Sep 13 '22 09:09 angry-crab

@angry-crab Yes to the points you listed, plus:

  • have a way to run the TVM model (I suggested here having a separate Autoware package for that, but it could be something else, I don't know what makes the most sense for what we want to achieve)
  • document the procedure

For the cuda backend, it is possible to compile networks for it with the script in ModelZoo. And using the result of that compilation with tvm_utility should work. But the automated compilation and release pipeline of the ModelZoo CI only targets llvm and vulkan (although that could be changed if there is a need). So yes, but it will be a manual process. EDIT: One more thing, the TVM runtime provided by tvm_vendor is not compiled with cuda support. So that is also something that will need to be compiled and installed locally. A very manual process then. I would recommend using the Vulkan backend for GPU acceleration, unless there a specific need for cuda.

ambroise-arm avatar Sep 13 '22 10:09 ambroise-arm

@ambroise-arm Thanks for the information. I've pushed centerpoint models. CenterPoint I'll work on the migration to tvm. As for cuda backend, I'll create another issue after this one is closed.

angry-crab avatar Sep 14 '22 09:09 angry-crab

@liuzf1988 Do you have any progress with this issue? @angry-crab might be able to help you if there are any other blocking items. A ROS2 package named lidar_centerpoint_tvm was finished last week, but I encountered some problems during the debugging process, and is working on it this week.

liuzf1988 avatar Sep 15 '22 06:09 liuzf1988

@liuzf1988 Thanks for the update. I had a look into the current centerpoint implementation. It is using some hand-crafted cuda kernels for processing intermediate data. Did you convert these kernels into tvm operations? It would be great if you could share the repo/link so that I could help.

angry-crab avatar Sep 15 '22 06:09 angry-crab

@angry-crab Yes, it take some time to rewrite the cuda code (currently it is changed to std::thread implementation). Can I submit the code to a pull request tomorrow (I need a little time to restore and organize the code)? Then we can together debug and optimize the code.

liuzf1988 avatar Sep 15 '22 07:09 liuzf1988

@liuzf1988 Thanks for the update. I had a look into the current centerpoint implementation. It is using some hand-crafted cuda kernels for processing intermediate data. Did you convert these kernels into tvm operations? It would be great if you could share the repo/link so that I could help.

@angry-crab Here is the two definitions for voxel_encoder and backbone_neck_head onnx model of centerpoint.

definition.zip

liuzf1988 avatar Sep 15 '22 07:09 liuzf1988

Ideally, cuda kernels could be generated by tvm by defining high-level abstraction, such as tvmScript or tensor Equation. In this issue, it makes more sense to first convert kernels into c++ code and then migrate to tvm dialects to ensure correctness.

angry-crab avatar Sep 21 '22 07:09 angry-crab

As a reference of using current tvm utility, apollo_segmentation_tvm also uses c++ for preprocessing and postprocessing. I think tvmScript implemention should be handled in a different issue after this issue is closed.

angry-crab avatar Sep 28 '22 06:09 angry-crab

Hi everyone, I'm one of developers from the TVM community. I'm very pleased to find that TVM is being integrated to Autoware. Feel free to reach out for any help (I'm active in our discussion forum https://discuss.tvm.apache.org/).

P.S. I've worked on improving the Vulkan backend last year, and I consider it to be one of TVM's unique strengths - Glad to see that it has found a new industrial use case.

masahi avatar Nov 03 '22 07:11 masahi

Hi everyone, I'm one of developers from the TVM community. I'm very pleased to find that TVM is being integrated to Autoware. Feel free to reach out for any help (I'm active in our discussion forum https://discuss.tvm.apache.org/).

P.S. I've worked on improving the Vulkan backend last year, and I consider it to be one of TVM's unique strengths - Glad to see that it has found a new industrial use case.

Hi, Thank you so much for reaching out. TVM is a really great project with amazing features. We will definitely keep integrating/testing TVM. Please also let us know if there is anything we could do for the TVM community!

angry-crab avatar Nov 08 '22 09:11 angry-crab

Closed because the PR is merged.

angry-crab avatar Nov 28 '22 08:11 angry-crab