DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG]transformer_inference.so: cannot open shared object file: No such file or directory

Open newsongwf opened this issue 1 year ago • 3 comments

Describe the bug In the third stage of running RLHF, this error occurred.

To Reproduce Steps to reproduce the behavior: sh step3_rlhf_finetuning/training_scripts/single_gpu/run_1.3b.sh

Expected behavior A clear and concise description of what you expected to happen.

ds_report output Please run ds_report to give us details about your setup. image

Screenshots If applicable, add screenshots to help explain your problem. image

System info (please complete the following information):

  • OS: [e.g. Ubuntu 18.04]
  • four V100 32G]
  • (if applicable) what DeepSpeed-MII version are you using
  • (if applicable) Hugging Face Transformers/Accelerate/etc. versions
  • Python version
  • Any other relevant info about your setup

Docker context Are you using a specific docker image that you can share?

Additional context Add any other context about the problem here.

newsongwf avatar Apr 26 '23 08:04 newsongwf

I am facing same issue while inferencing using ds-mii. Any progress?

ryuzakace avatar May 03 '23 06:05 ryuzakace

I am facing same issue while inferencing using ds-mii. Any progress?

no progress

newsongwf avatar May 05 '23 02:05 newsongwf

Can you try

git clone https://github.com/microsoft/DeepSpeed                                                                                                                                             
cd DeepSpeed
DS_BUILD_OPS=1 DS_BUILD_AIO=0 DS_BUILD_SPARSE_ATTN=0 pip install -e . --global-option="build_ext" --global-option="-g" --global-option="-j8" --no-cache -v --disable-pip-version-check

rohansood10 avatar May 09 '23 18:05 rohansood10

I have the same problem for ds inference with bloom 176B

jens5588 avatar Jul 10 '23 19:07 jens5588