FasterTransformer conda issue
Thanks a lot for sharing the code. I followed the steps mentioned here for running it locally without docker, but I am getting the following error.
Traceback (most recent call last):
File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/glm_server.py", line 105, in <module>
glm.init_model(512,# output_len,
File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/../../../examples/pytorch/glm/utils/glm.py", line 375, in init_model
self.cuda()
File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/../../../examples/pytorch/glm/utils/glm.py", line 359, in cuda
self.model = self.Glm(get_torch_default_comm(), self.rank, self.head_num, self.size_per_head, self.head_num * self.size_per_head * 8 // 3,
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
1. libth_glm.Glm(arg0: c10d::ProcessGroupNCCL, arg1: int, arg2: int, arg3: int, arg4: int, arg5: int, arg6: int, arg7: int, arg8: int, arg9: int, arg10: int, arg11: int, arg12: int, arg13: List[at::Tensor], arg14: List[at::Tensor], arg15: List[at::Tensor])
This is a known issue mentioned here, you can try to use g++-7 and g++-9, in most of the case, one of them will work.
Thanks for the reply, I had built with g++-9, and was able to complete until the make -j step but was failing in the above part. But on using g++-7, I am facing a number of issues in the initial cmake itself, will it be a better option to build pytorch instead? And will building a pytorch with g++-9 solve the issue?
Thanks for the reply, I had built with g++-9, and was able to complete until the make -j step but was failing in the above part. But on using g++-7, I am facing a number of issues in the initial cmake itself, will it be a better option to build pytorch instead? And will building a pytorch with g++-9 solve the issue?
Yes, that should be the best solution.
I have a similar issue. When running make -j I get the following error:
[100%] Built target pyt_swintransformer
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h: In member function ‘ncclComm* torch_ext::HackNCCLGroup::getcomm(size_t, size_t, const char*)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:40:17: error: cannot convert ‘bool’ to ‘c10d::OpType’
40 | true,
| ^~~~
| |
| bool
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:21,
from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:374:14: note: initializing argument 2 of ‘void c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, c10d::OpType, const string&, int)’
374 | OpType opType,
| ~~~~~~~^~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc: In member function ‘std::vector<at::Tensor> torch_ext::GlmOp::forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int64_t)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
148 | th::Tensor output_ids = torch::empty({batch_size, beam_width, total_request_output_len},
| ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
150 | th::Tensor output_ids_buf = torch::empty({batch_size, beam_width, total_request_output_len},
| ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
152 | th::Tensor logits_buf = torch::empty({batch_size, beam_width, vocab_size_},
| ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
154 | th::Tensor parent_ids = torch::empty({total_request_output_len, batch_size, beam_width},
| ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
157 | torch::empty({batch_size, beam_width}, torch::dtype(torch::kInt32).device(torch::kCUDA).requires_grad(false));
| ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
159 | torch::empty({batch_size, beam_width}, torch::dtype(torch::kFloat32).device(torch::kCUDA).requires_grad(false));
| ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
make[2]: *** [src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/build.make:76: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/GlmOp.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:6088: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
This is for g++ 9/10.
g++ 7/8 yields a different error about missing torch functions:
[100%] Built target pyt_swintransformer
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:0:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h: In member function ‘ncclComm* torch_ext::HackNCCLGroup::getcomm(size_t, size_t, const char*)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:42:21: error: no matching function for call to ‘torch_ext::HackNCCLGroup::broadcastUniqueNCCLID(ncclUniqueId*, bool, const char*&, size_t&)’
rank);
^
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:21:0,
from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:372:8: note: candidate: void c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, c10d::OpType, const string&, int)
void broadcastUniqueNCCLID(
^~~~~~~~~~~~~~~~~~~~~
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:372:8: note: no known conversion for argument 2 from ‘bool’ to ‘c10d::OpType’
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc: In member function ‘std::vector<at::Tensor> torch_ext::GlmOp::forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int64_t)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
th::Tensor output_ids = torch::empty({batch_size, beam_width, total_request_output_len},
^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
th::Tensor output_ids_buf = torch::empty({batch_size, beam_width, total_request_output_len},
^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
th::Tensor logits_buf = torch::empty({batch_size, beam_width, vocab_size_},
^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
th::Tensor parent_ids = torch::empty({total_request_output_len, batch_size, beam_width},
^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
torch::empty({batch_size, beam_width}, torch::dtype(torch::kInt32).device(torch::kCUDA).requires_grad(false));
^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
torch::empty({batch_size, beam_width}, torch::dtype(torch::kFloat32).device(torch::kCUDA).requires_grad(false));
^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
make[2]: *** [src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/build.make:76: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/GlmOp.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:6088: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
@volkerha Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND, would slove the problem. Just check GlmOp.h and do little changes.
@volkerha I tried installing with c++7, but could not get past cmake stage. Will it be possible for you to share your conda environment in yaml file or something? It would be extremely helpful if you could share your environment details.
@volkerha I tried installing with c++7, but could not get past cmake stage. Will it be possible for you to share your conda environment in yaml file or something? It would be extremely helpful if you could share your environment details.
Modifying the compiler version should not cause too many side effects, I recommend try this steps.
- Delete build folder
- Change CX and CXX in environmental variables
- Retry
Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND
That solved the issue for me 🙏
@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.
Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND
That solved the issue for me 🙏
@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.
Thanks for the reply.
I meet the same issue.
@todiketan have you solved the problem
@jimmycrowoo No I was not able to solve the issue with either g++7 or g++9.
The easiest way to solve this problem is using docker, due to conda environment varies greatly.
@todiketan I met the issue with g++-9, and solved the problem with g++-7
@prnake How can I call the service after running service with default flask app of fasttransformers? Are there some demos? Thanks!
@prnake How can I call the service after running service with default flask app of fasttransformers? Are there some demos? Thanks!
You can use scripts in the example directory: https://github.com/THUDM/FasterTransformer/tree/main/examples/pytorch/glm, there are tests and a simple gradio frontend.
The root cause of this issue is difference pybind11 abi between pytorch and th-glm. Add the pybind11 macro define like https://github.com/pytorch/pytorch/blob/1fae179ee1b59c42c41f9dc7b55a2cba64737adb/torch/utils/cpp_extension.py#L1975 when build can fix it.
with
How did this change? I didn't quite understand it
Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND
That solved the issue for me 🙏
@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.
大佬怎么解决的我没看懂呢