[WASI-NN] tensorrt: Add support for TensorRT-LLM
Hello, I am a code review agent on flows.network. Here are my reviews of changed source code files in this PR.
.github/workflows/matrix-extensions.json
Potential issues
-
Duplicate
binandtargetfor TensorRT-LLM: Thewasi_nn-tensorrtplugin uses the samebinandtargetas other backends (libwasmedgePluginWasiNN), which could lead to build conflicts or unexpected behavior if all backends are compiled together. -
Missing CUDA Architecture Spec for TensorRT-LLM: The
wasi_nn-tensorrtplugin specifies a CUDA compiler but lacksCMAKE_CUDA_ARCHITECTURES, unlike similar plugins likewasmedge_stablediffusion-cuda-12. This may lead to suboptimal or incompatible GPU code generation. -
Platform Specificity for TensorRT-LLM: The
platformsfield forwasi_nn-tensorrtincludes only one platform (ubuntu2004_tensorrt_cuda12). If TensorRT-LLM is intended to support more platforms, this list should be expanded to match other backend plugins.
Summary of changes
-
- Added support for TensorRT-LLM in the
wasi_nn-tensorrtplugin.
- Added support for TensorRT-LLM in the
- Configured the build options to include
-DWASMEDGE_PLUGIN_WASI_NN_BACKEND=TensorRTand specify the CUDA compiler path. - Targeted the platform
ubuntu2004_tensorrt_cuda12specifically for this configuration.
.github/workflows/reusable-build-extensions.yml
Potential issues
-
Matrix Duplication Issue: The
ubuntu2004_x86_64asset tag is duplicated in multiple matrix entries within thebuild_on_ubuntujob, which could lead to confusion or unintended behavior. -
Hardcoded Docker Tags: The Docker tags (
ubuntu-20.04-build-clang-plugins-deps, etc.) are hardcoded in the workflow YAML. This can make maintenance and updates more difficult if these tags change. -
Conditional Logic for Release Builds: The conditional logic for release builds in
build_on_ubuntu_latest(if: ${{ !inputs.release }}) might unintentionally exclude valid scenarios where a specific combination of inputs should trigger this job, leading to incomplete build coverage.
Summary of changes
-
- Added support for a new CI job targeting
ubuntu2004_tensorrt_cuda12.
- Added support for a new CI job targeting
- Included
ubuntu2004_tensorrt_cuda12in the list of asset tags to be processed. - Configured a matrix strategy to build and test with TensorRT-LLM using CUDA 12 on Ubuntu 20.04.
cmake/WASINNDeps.cmake
Potential issues
-
Missing Architecture Check for TensorRT: The
wasmedge_setup_tensorrt_targetfunction does not include architecture checks similar to those in the TensorFlow setup functions, which can lead to build failures on unsupported architectures. -
No Dependency Handling for TensorRT Plugins: The
wasmedge_setup_tensorrt_targetfunction links againstnvinfer_plugin_tensorrt_llm, but it does not ensure that this plugin is available or handle cases where it might not be found, leading to potential linking errors. -
Hardcoded Tag in TensorRT-LLM Setup: The
wasmedge_setup_tensorrt_targetfunction uses a hardcoded tag (v0.14.0) for fetching the TensorRT-LLM repository. This should be parameterized or configurable to allow updates without modifying the code directly.
Summary of changes
-
- Added TensorRT backend support: Introduced a new backend option for TensorRT in the
wasmedge_setup_wasinn_targetfunction.
- Added TensorRT backend support: Introduced a new backend option for TensorRT in the
- Included TensorRT-LLM library: Defined a new function
wasmedge_setup_tensorrt_targetto fetch, build, and link the TensorRT-LLM library. - Configured CMake options for TensorRT-LLM: Set internal CMake variables to disable unnecessary builds (PyTorch, tests, benchmarks) when preparing the TensorRT-LLM target.
plugins/wasi_nn/CMakeLists.txt
Potential issues
-
Issue 1: The
BACKENDvariable is converted to lowercase usingstring(TOLOWER ${BACKEND} BACKEND), which may lead to unexpected behavior if the backend names are case-sensitive elsewhere in the code. -
Issue 2: The inclusion of
MLXspecific source files is hardcoded within theforeachloop, which could make it difficult to manage or add new backends without modifying this section of the code. -
Issue 3: There is no validation to ensure that the
WASMEDGE_PLUGIN_WASI_NN_BACKENDlist contains valid backend names before they are processed in theforeachloop, potentially leading to build errors if an invalid name is included.
Summary of changes
-
- Added
wasinn_tensorrt.cppto the list of source files in thewasmedgePluginWasiNNlibrary.
- Added
- This change introduces support for TensorRT-LLM within the WasmEdge plugin, enhancing its capabilities with new inference features.
plugins/wasi_nn/wasinn_tensorrt.cpp
Potential issues
-
Incorrect Error Handling in
Tokenizer::load: The function returnsErrNo::InvalidArgumenton astd::bad_alloc, which is incorrect sincestd::bad_allocindicates an allocation failure rather than an invalid argument. It should return a different error code or handle the exception differently to reflect the memory allocation issue. -
Potential Buffer Overflow in
Tokenizer::parse: The function usesreinterpret_cast<const char *>(NormalizedString.data()) + Iwithout checking ifIis within bounds, which could lead to accessing memory out of theNormalizedStringbuffer ifIexceeds the string's size minus the length of a special token. This can cause undefined behavior. -
Inconsistent Error Handling in
load: When parsing metadata, if an error occurs (Parser.parse(Metadata).get(Doc)fails), it logs an error and pops back the last added graph fromEnv.NNGraph. However, for subsequent errors within the same block (e.g., missing keys or invalid values), it does not remove the graph, leading to potential inconsistent states. All errors should ensure that partial state changes are reverted.
Summary of changes
-### Key Changes Summary:
- Added TensorRT-LLM Support: Introduced comprehensive support for TensorRT-LLM, including tokenizer initialization and parsing/unparsing logic.
- Plugin Initialization: Added a
LoggerProxyclass to handle TensorRT logging and initialized TensorRT-LLM plugins usinginitTrtLlmPlugins. - Graph Loading Logic: Enhanced the graph loading function to handle required files (
rank0.engine,config.json,tokenizer.json) and additional metadata options, ensuring proper initialization of the TensorRT executor.
plugins/wasi_nn/wasinn_tensorrt.h
Potential issues
-
Issue with Tokenizer Initialization: The
Tokenizerclass has static membersByte2U8CharandU8Char2Bytethat are never initialized, which could lead to undefined behavior if used before initialization. -
Potential Memory Leak in Graph Structure: The
Graphstruct contains astd::unique_ptr<tensorrt_llm::executor::Executor>but does not have a custom destructor or move semantics defined. If an exception occurs during the construction ofExecutor, it could lead to resource leaks. -
Inconsistent Initialization of Special Token IDs: In the
Tokenizerclass, several special token IDs (e.g.,SEPId,PADId) are initialized to-1, which might not be a valid or intended value for these identifiers in the context of the application. This could cause issues during token parsing and unparsing if expected values differ.
Summary of changes
-
- Added TensorRT-LLM support: Introduced new classes and structures (
Tokenizer,Graph,Context) to support TensorRT-LLM in the WASI-NN backend.
- Added TensorRT-LLM support: Introduced new classes and structures (
- Conditional compilation with WASMEDGE_PLUGIN_WASI_NN_BACKEND_TENSORRT: Included TensorRT-specific code only when the corresponding plugin is enabled, ensuring modularity.
- Enhanced tokenizer functionality: Implemented methods for loading token mappings from a JSON file and parsing/unparsing strings into tokens, facilitating interaction with LLM models.
plugins/wasi_nn/wasinnenv.cpp
Potential issues
-
Duplicate Backend Entry: The
BackendMapcontains duplicate entries for"pytorchaoti"svand"tensorflow"which might cause confusion or unexpected behavior when resolving backends. -
Error Handling in Model Loading: In the
loadfunction, ifFileSize >= Data.max_size(), it logs an error but does not explicitly close the file, leading to potential resource leakage. -
Magic String in Code: The code uses magic strings like
"unix://"for URI validation without defining constants, which can lead to maintenance issues and errors if these values need to change.
Summary of changes
-
- Added TensorRT backend to the
BackendMapwith key"tensorrt".
- Added TensorRT backend to the
- Introduced file size validation in the
loadfunction, checking if the file is too big before reading its content. - Corrected the formatting for the
BackendMapinitialization by adding a missing comma.
plugins/wasi_nn/wasinnenv.h
Potential issues
-
Issue with
__builtin_unreachable()Usage: The__builtin_unreachable()function is used in switch statements without ensuring that all cases are handled, which can lead to undefined behavior if an unexpected backend is encountered. Ensure that the default case handles or logs unknown backends. -
Potential Data Race in
Context::init: TheGraph::getBackend()call withinContext::initcould lead to a data race ifGraph's state is modified concurrently. Consider adding locks or ensuring thread safety forGraphoperations accessed from multiple threads. -
Lack of Error Handling in
WasiNNEnvironment::mdBuild: TheLoadcallback's return value is checked, but there is no handling or logging if an error occurs. Implement proper error handling and logging to diagnose issues whenLoadfails.
Summary of changes
-
bullet points summarizing the key changes:
-
Added TensorRT Support: Included
wasinn_tensorrt.hto integrate support for TensorRT in the project. -
Enhanced Model Deployment Options: Extended the backend options by adding TensorRT, which allows for potentially more efficient inference on NVIDIA GPUs.
-
Prepared for LLM Integration: The inclusion of TensorRT specifically mentions "TensorRT-LLM," indicating preparation for Large Language Model support using this backend.
plugins/wasi_nn/wasinntypes.h
Potential issues
-
Issue: The
FOR_EACH_BACKENDmacro is missing theAutodetectbackend.- Explanation: The macro lists all backends except
Autodetect, which could lead to errors or inconsistencies when iterating over backends using this macro.
- Explanation: The macro lists all backends except
-
Issue: The
fmt::formatter<WasmEdge::Host::WASINN::Device>does not handle theAUTOdevice.- Explanation: The switch statement in the formatter for
Deviceenums lacks a case forAUTO, which will result in "Unknown" being printed whenAUTOis encountered.
- Explanation: The switch statement in the formatter for
-
Issue: Missing
TensorRT-LLMbackend definition and macro inclusion.- Explanation: The patch title indicates support for
TensorRT-LLM, but there is no corresponding entry in theBackendenum or theFOR_EACH_BACKENDmacro, which would prevent using it.
- Explanation: The patch title indicates support for
Summary of changes
-
- Added TensorRT to the Backend Enum: Introduced
TensorRTas a new backend option in theBackendenum.
- Added TensorRT to the Backend Enum: Introduced
- Updated FOR_EACH_BACKEND Macro: Included
F(TensorRT)in the macro to ensure it processes TensorRT alongside other backends. - Modified Backend Count Implicitly: Increased the number of supported backends by adding TensorRT, which affects any logic relying on backend enumeration.
utils/docker/Dockerfile.ubuntu-cuda
Potential issues
- The
TENSORRTARG is used in a way that it does not conditionally install TensorRT packages when set to "ON". Thecasestatement syntax is incorrect; it should include a pattern match (ON)) rather than directly referencing the variable. - The
CUDA_KEYRINGfilename is hardcoded for Ubuntu 20.04, which may not be suitable if the base image changes to a different version of Ubuntu or another Linux distribution. - Setting
CXXFLAGS="-Wno-error"globally suppresses all warnings as errors, which can hide important issues during compilation; consider using more specific flags or removing it if unnecessary.
Summary of changes
-
- Added an argument
TENSORRTwith a default value ofOFF.
- Added an argument
- Introduced a conditional installation block to install TensorRT and related dependencies when
TENSORRT=ON. - Updated the list of packages installed in the Dockerfile to include TensorRT-specific dependencies.
utils/docker/docker-bake.ubuntu.hcl
Potential issues
-
Inconsistent Dockerfile Paths: The
dockerfilepath forcuda-tensorrtis set to"Dockerfile.ubuntu-cuda", which might be incorrect if it should use a different Dockerfile specifically for TensorRT, such as"Dockerfile.ubuntu-cuda-tensorrt". -
Hardcoded Base Image Tags: In the
cudaandcuda-tensorrttargets, the base image tag is hardcoded to"wasmedge/wasmedge:ubuntu-20.04-build-gcc". This could lead to maintenance issues if the base image needs to be updated or generalized. -
Duplicate Contexts Key: The
target "plugins"has a typo in thecontextskey, usingcontextsinstead ofcontext, which will likely cause an error.
Summary of changes
-• Added a new target "cuda-tensorrt" to the targets list.
• Introduced a new Dockerfile configuration for the "cuda-tensorrt" target with TensorRT support, specifying CUDA version 12.6 and enabling TensorRT through build arguments.
• Configured specific Docker contexts and tags for the new "cuda-tensorrt" target to ensure proper image building and tagging.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 80.32%. Comparing base (
0ad562d) to head (73f490c).
Additional details and impacted files
@@ Coverage Diff @@
## master #3878 +/- ##
=======================================
Coverage 80.32% 80.32%
=======================================
Files 280 280
Lines 38920 38920
Branches 6800 6800
=======================================
Hits 31264 31264
Misses 6050 6050
Partials 1606 1606
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.