DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

subprocess.CalledProcessError: Command '['which', 'c++']' returned non-zero exit status 1.

Open pudasainishushant opened this issue 3 years ago • 3 comments

I am trying to train a masked language model from huggingface Trainer API using deepspeed. The training is succesfull while I train on ubuntu operating system while I try to train on Centos I faced this issue. languag_model_nic

pudasainishushant avatar Dec 19 '21 04:12 pudasainishushant

What is the ouput of running which c++ in your terminal?

tjruwase avatar Dec 30 '21 00:12 tjruwase

What is the ouput of running which c++ in your terminal?

@tjruwase @pudasainishushant Sorry, I met the same problem. Have you ever found the solution? And then I run the command which c++, it outputs the nothing. The ds_report shows strange with lots of no.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.10/site-packages/torch']
torch version .................... 1.13.1
deepspeed install path ........... ['/opt/conda/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.8.3, unknown, unknown
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.6

Thanks for your reply.

JomeiLiu avatar Apr 12 '23 10:04 JomeiLiu

What is the ouput of running which c++ in your terminal?

@tjruwase @pudasainishushant Sorry, I met the same problem. Have you ever found the solution? And then I run the command which c++, it outputs the nothing. The ds_report shows strange with lots of no.

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.10/site-packages/torch']
torch version .................... 1.13.1
deepspeed install path ........... ['/opt/conda/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.8.3, unknown, unknown
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed wheel compiled w. ...... torch 1.13, cuda 11.6

Thanks for your reply.

please install the C++ pkgs on your server e.g. "sudo yum install gcc-c++"

zx12671 avatar May 02 '23 09:05 zx12671

@zx12671 I do 'sudo apt-get install g++',but I had the error.Do you have solutions to solve it ?

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libc6-dev : Depends: libc6 (= 2.31-0ubuntu9.10) but 2.35-0ubuntu3.1 is to be installed
             Depends: libc-dev-bin (= 2.31-0ubuntu9.10) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

niuhuluzhihao avatar Jun 20 '23 17:06 niuhuluzhihao

Hi @summer-silence - you will need to follow the directions from that error and also install one of the packages that is listed there, libc6 for example.

loadams avatar Aug 21 '23 23:08 loadams