GLM-130B icon indicating copy to clipboard operation
GLM-130B copied to clipboard

FasterTransformer conda issue

Open todiketan opened this issue 3 years ago • 19 comments

Thanks a lot for sharing the code. I followed the steps mentioned here for running it locally without docker, but I am getting the following error.

Traceback (most recent call last):
  File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/glm_server.py", line 105, in <module>
    glm.init_model(512,# output_len,
  File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/../../../examples/pytorch/glm/utils/glm.py", line 375, in init_model
    self.cuda()
  File "/projects/tir4/users/zhengbaj/exp/GLM-130B/FasterTransformer/examples/pytorch/glm/../../../examples/pytorch/glm/utils/glm.py", line 359, in cuda
    self.model = self.Glm(get_torch_default_comm(), self.rank, self.head_num, self.size_per_head, self.head_num * self.size_per_head * 8 // 3,
TypeError: __init__(): incompatible constructor arguments. The following argument types are supported:
    1. libth_glm.Glm(arg0: c10d::ProcessGroupNCCL, arg1: int, arg2: int, arg3: int, arg4: int, arg5: int, arg6: int, arg7: int, arg8: int, arg9: int, arg10: int, arg11: int, arg12: int, arg13: List[at::Tensor], arg14: List[at::Tensor], arg15: List[at::Tensor])

todiketan avatar Nov 18 '22 23:11 todiketan

This is a known issue mentioned here, you can try to use g++-7 and g++-9, in most of the case, one of them will work.

prnake avatar Nov 19 '22 03:11 prnake

Thanks for the reply, I had built with g++-9, and was able to complete until the make -j step but was failing in the above part. But on using g++-7, I am facing a number of issues in the initial cmake itself, will it be a better option to build pytorch instead? And will building a pytorch with g++-9 solve the issue?

todiketan avatar Nov 19 '22 22:11 todiketan

Thanks for the reply, I had built with g++-9, and was able to complete until the make -j step but was failing in the above part. But on using g++-7, I am facing a number of issues in the initial cmake itself, will it be a better option to build pytorch instead? And will building a pytorch with g++-9 solve the issue?

Yes, that should be the best solution.

prnake avatar Nov 23 '22 09:11 prnake

I have a similar issue. When running make -j I get the following error:

[100%] Built target pyt_swintransformer
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h: In member function ‘ncclComm* torch_ext::HackNCCLGroup::getcomm(size_t, size_t, const char*)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:40:17: error: cannot convert ‘bool’ to ‘c10d::OpType’
   40 |                 true,
      |                 ^~~~
      |                 |
      |                 bool
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:21,
                 from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:374:14: note:   initializing argument 2 of ‘void c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, c10d::OpType, const string&, int)’
  374 |       OpType opType,
      |       ~~~~~~~^~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc: In member function ‘std::vector<at::Tensor> torch_ext::GlmOp::forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int64_t)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  148 |     th::Tensor output_ids = torch::empty({batch_size, beam_width, total_request_output_len},
      |                                                       ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  150 |     th::Tensor output_ids_buf = torch::empty({batch_size, beam_width, total_request_output_len},
      |                                                           ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  152 |     th::Tensor logits_buf = torch::empty({batch_size, beam_width, vocab_size_},
      |                                                       ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  154 |     th::Tensor parent_ids = torch::empty({total_request_output_len, batch_size, beam_width},
      |                                                                                 ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  157 |         torch::empty({batch_size, beam_width}, torch::dtype(torch::kInt32).device(torch::kCUDA).requires_grad(false));
      |                                   ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
  159 |         torch::empty({batch_size, beam_width}, torch::dtype(torch::kFloat32).device(torch::kCUDA).requires_grad(false));
      |                                   ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t’ {aka ‘long unsigned int’} to ‘long int’ [-Wnarrowing]
make[2]: *** [src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/build.make:76: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/GlmOp.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:6088: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

This is for g++ 9/10.

g++ 7/8 yields a different error about missing torch functions:

[100%] Built target pyt_swintransformer
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:0:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h: In member function ‘ncclComm* torch_ext::HackNCCLGroup::getcomm(size_t, size_t, const char*)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:42:21: error: no matching function for call to ‘torch_ext::HackNCCLGroup::broadcastUniqueNCCLID(ncclUniqueId*, bool, const char*&, size_t&)’
                 rank);
                     ^
In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h:21:0,
                 from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17:
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:372:8: note: candidate: void c10d::ProcessGroupNCCL::broadcastUniqueNCCLID(ncclUniqueId*, c10d::OpType, const string&, int)
   void broadcastUniqueNCCLID(
        ^~~~~~~~~~~~~~~~~~~~~
/opt/conda/lib/python3.8/site-packages/torch/include/c10d/ProcessGroupNCCL.hpp:372:8: note:   no known conversion for argument 2 from ‘bool’ to ‘c10d::OpType’
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc: In member function ‘std::vector<at::Tensor> torch_ext::GlmOp::forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int64_t)’:
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
     th::Tensor output_ids = torch::empty({batch_size, beam_width, total_request_output_len},
                                                       ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:148:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
     th::Tensor output_ids_buf = torch::empty({batch_size, beam_width, total_request_output_len},
                                                           ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:150:59: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
     th::Tensor logits_buf = torch::empty({batch_size, beam_width, vocab_size_},
                                                       ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:152:55: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
     th::Tensor parent_ids = torch::empty({total_request_output_len, batch_size, beam_width},
                                                                                 ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:154:81: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
         torch::empty({batch_size, beam_width}, torch::dtype(torch::kInt32).device(torch::kCUDA).requires_grad(false));
                                   ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:157:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
         torch::empty({batch_size, beam_width}, torch::dtype(torch::kFloat32).device(torch::kCUDA).requires_grad(false));
                                   ^~~~~~~~~~
/home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:159:35: warning: narrowing conversion of ‘((torch_ext::GlmOp*)this)->torch_ext::GlmOp::beam_width’ from ‘size_t {aka long unsigned int}’ to ‘long int’ inside { } [-Wnarrowing]
make[2]: *** [src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/build.make:76: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/GlmOp.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:6088: src/fastertransformer/th_op/glm/CMakeFiles/th_glm.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

volkerha avatar Nov 24 '22 18:11 volkerha

@volkerha Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND, would slove the problem. Just check GlmOp.h and do little changes.

prnake avatar Nov 24 '22 18:11 prnake

@volkerha I tried installing with c++7, but could not get past cmake stage. Will it be possible for you to share your conda environment in yaml file or something? It would be extremely helpful if you could share your environment details.

todiketan avatar Nov 25 '22 08:11 todiketan

@volkerha I tried installing with c++7, but could not get past cmake stage. Will it be possible for you to share your conda environment in yaml file or something? It would be extremely helpful if you could share your environment details.

Modifying the compiler version should not cause too many side effects, I recommend try this steps.

  • Delete build folder
  • Change CX and CXX in environmental variables
  • Retry

prnake avatar Nov 25 '22 08:11 prnake

Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND

That solved the issue for me 🙏

@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.

volkerha avatar Nov 25 '22 11:11 volkerha

Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND

That solved the issue for me 🙏

@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.

Thanks for the reply.

prnake avatar Nov 25 '22 12:11 prnake

I meet the same issue.

jimmycrowoo avatar Feb 07 '23 08:02 jimmycrowoo

@todiketan have you solved the problem

jimmycrowoo avatar Feb 07 '23 08:02 jimmycrowoo

@jimmycrowoo No I was not able to solve the issue with either g++7 or g++9.

todiketan avatar Feb 07 '23 09:02 todiketan

The easiest way to solve this problem is using docker, due to conda environment varies greatly.

prnake avatar Feb 07 '23 09:02 prnake

@todiketan I met the issue with g++-9, and solved the problem with g++-7

jimmycrowoo avatar Feb 07 '23 09:02 jimmycrowoo

@prnake How can I call the service after running service with default flask app of fasttransformers? Are there some demos? Thanks!

jimmycrowoo avatar Feb 07 '23 09:02 jimmycrowoo

@prnake How can I call the service after running service with default flask app of fasttransformers? Are there some demos? Thanks!

You can use scripts in the example directory: https://github.com/THUDM/FasterTransformer/tree/main/examples/pytorch/glm, there are tests and a simple gradio frontend.

prnake avatar Feb 07 '23 09:02 prnake

The root cause of this issue is difference pybind11 abi between pytorch and th-glm. Add the pybind11 macro define like https://github.com/pytorch/pytorch/blob/1fae179ee1b59c42c41f9dc7b55a2cba64737adb/torch/utils/cpp_extension.py#L1975 when build can fix it.

champson avatar Apr 04 '23 04:04 champson

with

How did this change? I didn't quite understand it

Rateofteasing avatar Apr 26 '23 01:04 Rateofteasing

Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND

That solved the issue for me 🙏

@todiketan I managed to build with g++ 9. g++ 7 gave me some torch errors as mentioned above. My conda env contains a lot of other dependencies, so I'm not sure how much sense it makes to share it. Also, I have a local cuda-11-6 installation.

大佬怎么解决的我没看懂呢

Rateofteasing avatar Apr 26 '23 01:04 Rateofteasing