GLM-130B FasterTransformer conda issue

Thanks a lot for sharing the code. I followed the steps mentioned here for running it locally without docker, but I am getting the following error.

Traceback (most recent call last):
  File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/glm_server.py", line 105, in <module>
    glm.init_model(512,# output_len,
  File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/../../../examples/pytorch/glm/utils/glm.py", line 375, in init_model
    self.cuda()
  File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/../../../examples/pytorch/glm/utils/glm.py", line 359, in cuda
    self.model = self.Glm(get_torch_default_comm(), self.rank, self.head_num, self.size_per_head, self.head_num * self.size_per_head * 8 // 3,
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. libth_glm.Glm(arg0: c10d::ProcessGroupNCCL, arg1: int, arg2: int, arg3: int, arg4: int, arg5: int, arg6: int, arg7: int, arg8: int, arg9: int, arg10: int, arg11: int, arg12: int, arg13: List[at::Tensor], arg14: List[at::Tensor], arg15: List[at::Tensor])

Nov 18 '22 23:11 todiketan

This is a known issue mentioned here, you can try to use g++-7 and g++-9, in most of the case, one of them will work.

Nov 19 '22 03:11 prnake

Thanks for the reply, I had built with g++-9, and was able to complete until the make -j step but was failing in the above part. But on using g++-7, I am facing a number of issues in the initial cmake itself, will it be a better option to build pytorch instead? And will building a pytorch with g++-9 solve the issue?

Nov 19 '22 22:11 todiketan

Thanks for the reply, I had built with g++-9, and was able to complete until the make -j step but was failing in the above part. But on using g++-7, I am facing a number of issues in the initial cmake itself, will it be a better option to build pytorch instead? And will building a pytorch with g++-9 solve the issue?

Yes, that should be the best solution.

Nov 23 '22 09:11 prnake

I have a similar issue. When running make -j I get the following error:

[100%] Built target pyt_swintransformer
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h: In member function ‘ncclComm* torch_ext::HackNCCLGroup::getcomm(size_t, size_t, const char*)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:40:17: error: cannot convert ‘bool’ to ‘c10d::OpType’
   40 |                 true,
      |                 ^~~~
      |                 |
      |                 bool
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:21,
                 from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:374:14: note:   initializing argument 2 of ‘void c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, c10d::OpType, const string&, int)’
  374 |       OpType opType,
      |       ~~~~~~~^~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc: In member function ‘std::vector<at::Tensor> torch_ext::GlmOp::forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int64_t)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  148 |     th::Tensor output_ids = torch::empty({batch_size, beam_width, total_request_output_len},
      |                                                       ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  150 |     th::Tensor output_ids_buf = torch::empty({batch_size, beam_width, total_request_output_len},
      |                                                           ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  152 |     th::Tensor logits_buf = torch::empty({batch_size, beam_width, vocab_size_},
      |                                                       ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  154 |     th::Tensor parent_ids = torch::empty({total_request_output_len, batch_size, beam_width},
      |                                                                                 ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  157 |         torch::empty({batch_size, beam_width}, torch::dtype(torch::kInt32).device(torch::kCUDA).requires_grad(false));
      |                                   ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  159 |         torch::empty({batch_size, beam_width}, torch::dtype(torch::kFloat32).device(torch::kCUDA).requires_grad(false));
      |                                   ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
make[2]: *** [src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/build.make:76: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/GlmOp.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:6088: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

This is for g++ 9/10.

g++ 7/8 yields a different error about missing torch functions:

[100%] Built target pyt_swintransformer
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:0:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h: In member function ‘ncclComm* torch_ext::HackNCCLGroup::getcomm(size_t, size_t, const char*)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:42:21: error: no matching function for call to ‘torch_ext::HackNCCLGroup::broadcastUniqueNCCLID(ncclUniqueId*, bool, const char*&, size_t&)’
                 rank);
                     ^
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:21:0,
                 from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:372:8: note: candidate: void c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, c10d::OpType, const string&, int)
   void broadcastUniqueNCCLID(
        ^~~~~~~~~~~~~~~~~~~~~
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:372:8: note:   no known conversion for argument 2 from ‘bool’ to ‘c10d::OpType’
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc: In member function ‘std::vector<at::Tensor> torch_ext::GlmOp::forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int64_t)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
     th::Tensor output_ids = torch::empty({batch_size, beam_width, total_request_output_len},
                                                       ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
     th::Tensor output_ids_buf = torch::empty({batch_size, beam_width, total_request_output_len},
                                                           ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
     th::Tensor logits_buf = torch::empty({batch_size, beam_width, vocab_size_},
                                                       ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
     th::Tensor parent_ids = torch::empty({total_request_output_len, batch_size, beam_width},
                                                                                 ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
         torch::empty({batch_size, beam_width}, torch::dtype(torch::kInt32).device(torch::kCUDA).requires_grad(false));
                                   ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
         torch::empty({batch_size, beam_width}, torch::dtype(torch::kFloat32).device(torch::kCUDA).requires_grad(false));
                                   ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
make[2]: *** [src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/build.make:76: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/GlmOp.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:6088: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

Nov 24 '22 18:11 volkerha

Nov 24 '22 18:11 prnake

@volkerha I tried installing with c++7, but could not get past cmake stage. Will it be possible for you to share your conda environment in yaml file or something? It would be extremely helpful if you could share your environment details.

Nov 25 '22 08:11 todiketan

@volkerha I tried installing with c++7, but could not get past cmake stage. Will it be possible for you to share your conda environment in yaml file or something? It would be extremely helpful if you could share your environment details.

Modifying the compiler version should not cause too many side effects, I recommend try this steps.

Delete build folder
Change CX and CXX in environmental variables
Retry

Nov 25 '22 08:11 prnake

Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND

That solved the issue for me 🙏

@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.

Nov 25 '22 11:11 volkerha

Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND

That solved the issue for me 🙏

@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.

Thanks for the reply.

Nov 25 '22 12:11 prnake

I meet the same issue.

Feb 07 '23 08:02 jimmycrowoo

@todiketan have you solved the problem

Feb 07 '23 08:02 jimmycrowoo

@jimmycrowoo No I was not able to solve the issue with either g++7 or g++9.

Feb 07 '23 09:02 todiketan

The easiest way to solve this problem is using docker, due to conda environment varies greatly.

Feb 07 '23 09:02 prnake

@todiketan I met the issue with g++-9, and solved the problem with g++-7

Feb 07 '23 09:02 jimmycrowoo

@prnake How can I call the service after running service with default flask app of fasttransformers? Are there some demos? Thanks!

Feb 07 '23 09:02 jimmycrowoo

@prnake How can I call the service after running service with default flask app of fasttransformers? Are there some demos? Thanks!

You can use scripts in the example directory: https://github.com/THUDM/FasterTransformer/tree/main/examples/pytorch/glm, there are tests and a simple gradio frontend.

Feb 07 '23 09:02 prnake

The root cause of this issue is difference pybind11 abi between pytorch and th-glm. Add the pybind11 macro define like https://github.com/pytorch/pytorch/blob/1fae179ee1b59c42c41f9dc7b55a2cba64737adb/torch/utils/cpp_extension.py#L1975 when build can fix it.

Apr 04 '23 04:04 champson

with

How did this change? I didn't quite understand it

Apr 26 '23 01:04 Rateofteasing

Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND

That solved the issue for me 🙏

@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.

大佬怎么解决的我没看懂呢

Apr 26 '23 01:04 Rateofteasing