Results 3 comments of yuyu-san

I tested `facebook/opt-6.7b` on 8 GPUs with TP=8, FP16. I takes around 28GB per GPU which looks like it's loading the full model parameters (6.7B * 4 bytes ~= 27GB)...

I have a similar issue. When running `make -j` I get the following error: ``` [100%] Built target pyt_swintransformer In file included from /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.cc:17: /home/jovyan/src/FasterTransformer/src/fastertransformer/th_op/glm/GlmOp.h: In member function ‘ncclComm* torch_ext::HackNCCLGroup::getcomm(size_t,...

> Maybe replace 40 | true, | ^~~~ | | | bool with c10d::OpType::SEND That solved the issue for me 🙏 @todiketan I managed to build with g++ 9. g++...