TensorRT-LLM
TensorRT-LLM copied to clipboard
Add batch manager static lib for Windows
System Info
- CPU architecture: x64
- GPU: RTX 4090 24G
- CUDA 12.2
Who can help?
@byshiue @nc
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
Following the official documentation for building tensorrt-llm wheel using docker on windows:
Clone and setup the TensorRT-LLM repository within the container:
git clone https://github.com/NVIDIA/TensorRT-LLM.git
cd TensorRT-LLM
git submodule update --init --recursive
Build TensorRT-LLM
python .\scripts\build_wheel.py -a "89-real" --trt_root C:\workspace\TensorRT-9.2.0.5\
The above results in the error:
file SIZE requested of path that is not readable:
... C:/workspace/TensorRT-LLM/cpp/tensorrt_llm/batch_manager/x86_64-windows-
msvc/tensorrt_llm_batch_manager_static.lib ...
Expected behavior
The wheel for TensorRT-LLM i.e. tensorrt_llm-0.7.1-cp310-cp310-win_amd64.whl being generated successfully.
actual behavior
Running the code to build the wheel results in an error indicating that the batch manager for windows is missing.
i.e.
CMake Error at tensorrt_llm/CMakeLists.txt:103 (file):
file SIZE requested of path that is not readable:
C:/workspace/TensorRT-LLM/cpp/tensorrt_llm/batch_manager/x86_64-windows-
msvc/tensorrt_llm_batch_manager_static.lib
additional notes
Kindly provide a batch manager file for windows. When I use provided in the rel branch, the generation fails. The batch manager for windows is missing in TensorRT-LLM\cpp\tensorrt_llm\batch_manager\.
Could you try running git-lfs pull to make sure you pull the lfs file?
Hi @byshiue The batch manager static lib for windows is missing in this branch. I have attached screenshots below. It is available in the rel branch but missing in main. See below:
In main TensorRT-LLM\cpp\tensorrt_llm\batch_manager\ , the only folders present are:
arch64-linux-gnu
x86_64-linux-gnu
whereas in the rel branch TensorRT-LLM\cpp\tensorrt_llm\batch_manager\ the folders are:
arch64-linux-gnu
x86_64-linux-gnu
x86_64-windows-msvc
Is it working on rel branch, or by copying the files from the rel branch into main?
Is it working on
relbranch, or by copying the files from therelbranch intomain?
Hello
rel branch compiles without issues.
main branch with copied batch_manager library can't link tensorrt_llm lib because of unresolved external symbols in batch_manager.
Partial build log:
[1093/1096] Linking CXX shared library tensorrt_llm\tensorrt_llm.dll
FAILED: tensorrt_llm/tensorrt_llm.dll tensorrt_llm/tensorrt_llm.lib
cmd.exe /C "cmd.exe /C "D:\Programs\Python310\Lib\site-packages\cmake\data\bin\cmake.exe -E __create_def D:\git\TensorRT-LLM\cpp\build\tensorrt_llm\CMakeFiles\tensorrt_llm.dir\.\exports.def D:\git\TensorRT-LLM\cpp\build\tensorrt_llm\CMakeFiles\tensorrt_llm.dir\.\exports.def.objs && cd D:\git\TensorRT-LLM\cpp\build" && D:\Programs\Python310\Lib\site-packages\cmake\data\bin\cmake.exe -E vs_link_dll --intdir=tensorrt_llm\CMakeFiles\tensorrt_llm.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests -- "D:\Programs\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\bin\Hostx64\x64\link.exe" /nologo @CMakeFiles\tensorrt_llm.rsp /out:tensorrt_llm\tensorrt_llm.dll /implib:tensorrt_llm\tensorrt_llm.lib /pdb:tensorrt_llm\tensorrt_llm.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /WHOLEARCHIVE:tensorrt_llm_batch_manager_static /DEF:tensorrt_llm\CMakeFiles\tensorrt_llm.dir\.\exports.def && cd ."
LINK: command "D:\Programs\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\bin\Hostx64\x64\link.exe /nologo @CMakeFiles\tensorrt_llm.rsp /out:tensorrt_llm\tensorrt_llm.dll /implib:tensorrt_llm\tensorrt_llm.lib /pdb:tensorrt_llm\tensorrt_llm.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /WHOLEARCHIVE:tensorrt_llm_batch_manager_static /DEF:tensorrt_llm\CMakeFiles\tensorrt_llm.dir\.\exports.def /MANIFEST:EMBED,ID=2" failed (exit code 1120) with the following output:
Создается библиотека tensorrt_llm\tensorrt_llm.lib и объект tensorrt_llm\tensorrt_llm.exp
LINK : warning LNK4098: библиотека по умолчанию "LIBCMT" конфликтует с использованием других библиотек; используйте /NODEFAULTLIB:library
gptSession.cpp.obj : error LNK2019: ссылка на неразрешенный внешний символ "public: __cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::KVCacheManager(int,int,int,int,int,int,int,int,int,int,bool,enum nvinfer1::DataType,class std::shared_ptr<class tensorrt_llm::runtime::CudaStream>,bool,bool)" (??0KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@QEAA@HHHHHHHHHH_NW4DataType@nvinfer1@@V?$shared_ptr@VCudaStream@runtime@tensorrt_llm@@@std@@00@Z) в функции "public: __cdecl std::_Ref_count_obj2<class tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager>::_Ref_count_obj2<class tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager><int const &,int const &,int const &,int const &,int const &,int &,int &,int &,int &,int &,bool const &,enum nvinfer1::DataType &,class std::shared_ptr<class tensorrt_llm::runtime::CudaStream>,bool &,bool const &>(int const &,int const &,int const &,int const &,int const &,int &,int &,int &,int &,int &,bool const &,enum nvinfer1::DataType &,class std::shared_ptr<class tensorrt_llm::runtime::CudaStream> &&,bool &,bool const &)" (??$?0AEBHAEBHAEBHAEBHAEBHAEAHAEAHAEAHAEAHAEAHAEB_NAEAW4DataType@nvinfer1@@V?$shared_ptr@VCudaStream@runtime@tensorrt_llm@@@std@@AEA_NAEB_N@?$_Ref_count_obj2@VKVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@@std@@QEAA@AEBH0000AEAH1111AEB_NAEAW4DataType@nvinfer1@@$$QEAV?$shared_ptr@VCudaStream@runtime@tensorrt_llm@@@1@AEA_N2@Z).
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "class std::shared_ptr<class tensorrt_llm::mpi::MpiRequest> __cdecl tensorrt_llm::mpi::bcast_async(void *,unsigned __int64,enum tensorrt_llm::mpi::MpiType,int,struct tensorrt_llm::mpi::MpiComm)" (?bcast_async@mpi@tensorrt_llm@@YA?AV?$shared_ptr@VMpiRequest@mpi@tensorrt_llm@@@std@@PEAX_KW4MpiType@12@HUMpiComm@12@@Z) в функции "public: __cdecl tensorrt_llm::batch_manager::`anonymous namespace'::decoderStepAsyncBcast::decoderStepAsyncBcast(class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,int)" (??0decoderStepAsyncBcast@?A0xbdd2a8b7@batch_manager@tensorrt_llm@@QEAA@V?$shared_ptr@VITensor@runtime@tensorrt_llm@@@std@@00000H@Z).
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "void __cdecl tensorrt_llm::mpi::bcast(void *,unsigned __int64,enum tensorrt_llm::mpi::MpiType,int,struct tensorrt_llm::mpi::MpiComm)" (?bcast@mpi@tensorrt_llm@@YAXPEAX_KW4MpiType@12@HUMpiComm@12@@Z) в функции "public: __cdecl tensorrt_llm::batch_manager::`anonymous namespace'::decoderStepAsyncBcast::decoderStepAsyncBcast(class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,class std::shared_ptr<class tensorrt_llm::runtime::ITensor>,int)" (??0decoderStepAsyncBcast@?A0xbdd2a8b7@batch_manager@tensorrt_llm@@QEAA@V?$shared_ptr@VITensor@runtime@tensorrt_llm@@@std@@00000H@Z).
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "public: static class std::shared_ptr<class tensorrt_llm::runtime::NcclCommunicator> __cdecl tensorrt_llm::runtime::NcclCommunicator::createPipelineComm(class tensorrt_llm::runtime::WorldConfig const &)" (?createPipelineComm@NcclCommunicator@runtime@tensorrt_llm@@SA?AV?$shared_ptr@VNcclCommunicator@runtime@tensorrt_llm@@@std@@AEBVWorldConfig@23@@Z) в функции "public: __cdecl tensorrt_llm::batch_manager::TrtGptModelInflightBatching::TrtGptModelInflightBatching(int,class std::shared_ptr<class nvinfer1::ILogger>,class tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class std::vector<unsigned char,class std::allocator<unsigned char> > const &,bool,enum tensorrt_llm::batch_manager::batch_scheduler::SchedulerPolicy,class tensorrt_llm::batch_manager::TrtGptModelOptionalParams const &)" (??0TrtGptModelInflightBatching@batch_manager@tensorrt_llm@@QEAA@HV?$shared_ptr@VILogger@nvinfer1@@@std@@AEBVGptModelConfig@runtime@2@AEBVWorldConfig@62@AEBV?$vector@EV?$allocator@E@std@@@4@_NW4SchedulerPolicy@batch_scheduler@12@AEBVTrtGptModelOptionalParams@12@@Z).
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "public: void __cdecl tensorrt_llm::runtime::NcclCommunicator::send<int const >(int const *,unsigned __int64,int,class tensorrt_llm::runtime::CudaStream const &)const " (??$send@$$CBH@NcclCommunicator@runtime@tensorrt_llm@@QEBAXPEBH_KHAEBVCudaStream@12@@Z) в функции "public: void __cdecl tensorrt_llm::runtime::NcclCommunicator::send<int>(class tensorrt_llm::runtime::IBuffer const &,int,class tensorrt_llm::runtime::CudaStream const &)const " (??$send@H@NcclCommunicator@runtime@tensorrt_llm@@QEBAXAEBVIBuffer@12@HAEBVCudaStream@12@@Z).
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "public: void __cdecl tensorrt_llm::runtime::NcclCommunicator::send<float const >(float const *,unsigned __int64,int,class tensorrt_llm::runtime::CudaStream const &)const " (??$send@$$CBM@NcclCommunicator@runtime@tensorrt_llm@@QEBAXPEBM_KHAEBVCudaStream@12@@Z) в функции "public: void __cdecl tensorrt_llm::runtime::NcclCommunicator::send<float>(class tensorrt_llm::runtime::IBuffer const &,int,class tensorrt_llm::runtime::CudaStream const &)const " (??$send@M@NcclCommunicator@runtime@tensorrt_llm@@QEBAXAEBVIBuffer@12@HAEBVCudaStream@12@@Z).
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "public: void __cdecl tensorrt_llm::runtime::NcclCommunicator::receive<int>(int *,unsigned __int64,int,class tensorrt_llm::runtime::CudaStream const &)const " (??$receive@H@NcclCommunicator@runtime@tensorrt_llm@@QEBAXPEAH_KHAEBVCudaStream@12@@Z) в функции "public: void __cdecl tensorrt_llm::runtime::NcclCommunicator::receive<int>(class tensorrt_llm::runtime::IBuffer &,int,class tensorrt_llm::runtime::CudaStream const &)const " (??$receive@H@NcclCommunicator@runtime@tensorrt_llm@@QEBAXAEAVIBuffer@12@HAEBVCudaStream@12@@Z).
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "public: void __cdecl tensorrt_llm::runtime::NcclCommunicator::receive<float>(float *,unsigned __int64,int,class tensorrt_llm::runtime::CudaStream const &)const " (??$receive@M@NcclCommunicator@runtime@tensorrt_llm@@QEBAXPEAM_KHAEBVCudaStream@12@@Z) в функции "public: void __cdecl tensorrt_llm::runtime::NcclCommunicator::receive<float>(class tensorrt_llm::runtime::IBuffer &,int,class tensorrt_llm::runtime::CudaStream const &)const " (??$receive@M@NcclCommunicator@runtime@tensorrt_llm@@QEBAXAEAVIBuffer@12@HAEBVCudaStream@12@@Z).
tensorrt_llm_batch_manager_static.lib(trtGptModelInflightBatching.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "bool CHECK_DEBUG_ENABLED" (?CHECK_DEBUG_ENABLED@@3_NA) в функции "private: class std::unique_ptr<class tensorrt_llm::runtime::decoder_batch::Token const ,struct std::default_delete<class tensorrt_llm::runtime::decoder_batch::Token const > > __cdecl tensorrt_llm::batch_manager::TrtGptModelInflightBatching::decoderStepAsync(class std::map<unsigned __int64,class std::shared_ptr<class tensorrt_llm::batch_manager::LlmRequest>,struct std::less<unsigned __int64>,class std::allocator<struct std::pair<unsigned __int64 const ,class std::shared_ptr<class tensorrt_llm::batch_manager::LlmRequest> > > > &)" (?decoderStepAsync@TrtGptModelInflightBatching@batch_manager@tensorrt_llm@@AEAA?AV?$unique_ptr@$$CBVToken@decoder_batch@runtime@tensorrt_llm@@U?$default_delete@$$CBVToken@decoder_batch@runtime@tensorrt_llm@@@std@@@std@@AEAV?$map@_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@U?$less@_K@2@V?$allocator@U?$pair@$$CB_KV?$shared_ptr@VLlmRequest@batch_manager@tensorrt_llm@@@std@@@std@@@2@@5@@Z).
tensorrt_llm_batch_manager_static.lib(kvCacheManager.cpp.obj) : error LNK2001: неразрешенный внешний символ "bool CHECK_DEBUG_ENABLED" (?CHECK_DEBUG_ENABLED@@3_NA).
tensorrt_llm_batch_manager_static.lib(kvCacheManager.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "int __cdecl tensorrt_llm::mpi::getCommWorldSize(void)" (?getCommWorldSize@mpi@tensorrt_llm@@YAHXZ) в функции "public: static int __cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::getMaxNumTokens(class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &,enum nvinfer1::DataType,class tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class tensorrt_llm::runtime::BufferManager const &)" (?getMaxNumTokens@KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@SAHAEBVKvCacheConfig@234@W4DataType@nvinfer1@@AEBVGptModelConfig@runtime@4@AEBVWorldConfig@94@AEBVBufferManager@94@@Z).
tensorrt_llm_batch_manager_static.lib(kvCacheManager.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "void __cdecl tensorrt_llm::mpi::allreduce(void const *,void *,int,enum tensorrt_llm::mpi::MpiType,enum tensorrt_llm::mpi::MpiOp,struct tensorrt_llm::mpi::MpiComm)" (?allreduce@mpi@tensorrt_llm@@YAXPEBXPEAXHW4MpiType@12@W4MpiOp@12@UMpiComm@12@@Z) в функции "public: static int __cdecl tensorrt_llm::batch_manager::kv_cache_manager::KVCacheManager::getMaxNumTokens(class tensorrt_llm::batch_manager::kv_cache_manager::KvCacheConfig const &,enum nvinfer1::DataType,class tensorrt_llm::runtime::GptModelConfig const &,class tensorrt_llm::runtime::WorldConfig const &,class tensorrt_llm::runtime::BufferManager const &)" (?getMaxNumTokens@KVCacheManager@kv_cache_manager@batch_manager@tensorrt_llm@@SAHAEBVKvCacheConfig@234@W4DataType@nvinfer1@@AEBVGptModelConfig@runtime@4@AEBVWorldConfig@94@AEBVBufferManager@94@@Z).
tensorrt_llm_batch_manager_static.lib(GptManager.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "public: static class tensorrt_llm::runtime::WorldConfig __cdecl tensorrt_llm::runtime::WorldConfig::mpi(class nvinfer1::ILogger &,int,class std::optional<int>,class std::optional<int>,class std::optional<class std::vector<int,class std::allocator<int> > >)" (?mpi@WorldConfig@runtime@tensorrt_llm@@SA?AV123@AEAVILogger@nvinfer1@@HV?$optional@H@std@@1V?$optional@V?$vector@HV?$allocator@H@std@@@std@@@7@@Z) в функции "public: static class std::shared_ptr<class tensorrt_llm::batch_manager::TrtGptModel> __cdecl tensorrt_llm::batch_manager::TrtGptModelFactory::create(class std::filesystem::path const &,enum tensorrt_llm::batch_manager::TrtGptModelType,int,enum tensorrt_llm::batch_manager::batch_scheduler::SchedulerPolicy,class tensorrt_llm::batch_manager::TrtGptModelOptionalParams const &)" (?create@TrtGptModelFactory@batch_manager@tensorrt_llm@@SA?AV?$shared_ptr@VTrtGptModel@batch_manager@tensorrt_llm@@@std@@AEBVpath@filesystem@5@W4TrtGptModelType@23@HW4SchedulerPolicy@batch_scheduler@23@AEBVTrtGptModelOptionalParams@23@@Z).
tensorrt_llm_batch_manager_static.lib(GptManager.cpp.obj) : error LNK2019: ссылка на неразрешенный внешний символ "private: static class tensorrt_llm::runtime::MemoryCounters tensorrt_llm::runtime::MemoryCounters::mInstance" (?mInstance@MemoryCounters@runtime@tensorrt_llm@@0V123@A) в функции "public: static class tensorrt_llm::runtime::MemoryCounters & __cdecl tensorrt_llm::runtime::MemoryCounters::getInstance(void)" (?getInstance@MemoryCounters@runtime@tensorrt_llm@@SAAEAV123@XZ).
tensorrt_llm\tensorrt_llm.dll : fatal error LNK1120: неразрешенных внешних элементов: 13
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "D:\git\TensorRT-LLM\scripts\build_wheel.py", line 319, in <module>
main(**vars(args))
File "D:\git\TensorRT-LLM\scripts\build_wheel.py", line 164, in main
build_run(
File "D:\Programs\Python310\lib\subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'cmake --build . --config Release --parallel 32 --target tensorrt_llm nvinfer_plugin_tensorrt_llm th_common bindings ' returned non-zero exit status 1.