Finetune_LLMs
Finetune_LLMs copied to clipboard
`RuntimeError: Error building extension 'cpu_adam'AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
I can't figure out how to fix this error. I am trying to run the example run.txt from here https://github.com/mallorbc/Finetune_GPTNEO_GPTJ6B/blob/main/finetuning_repo/example_run.txt
I run it and get this error, it has an error building with cpu_adam
`RuntimeError: Error building extension 'cpu_adam' Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f37056321f0> Traceback (most recent call last): File "/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/ops/adam/cpu_adam.py", line 97, in del
here is the full traceback
`Using /home/ubuntu/.cache/torch_extensions/py38_cu116 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py38_cu116/cpu_adam/build.ninja... Building extension module cpu_adam... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1013" -I/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /usr/lib/python3/dist-packages/torch/include -isystem /usr/lib/python3/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/lib/python3/dist-packages/torch/include/TH -isystem /usr/lib/python3/dist-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256 -c /home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o FAILED: cpu_adam.o c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1013" -I/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/usr/include -isystem /usr/lib/python3/dist-packages/torch/include -isystem /usr/lib/python3/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/lib/python3/dist-packages/torch/include/TH -isystem /usr/lib/python3/dist-packages/torch/include/THC -isystem /usr/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++14 -O3 -std=c++14 -g -Wno-reorder -L/usr/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX256 -c /home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o In file included from /usr/lib/python3/dist-packages/torch/include/torch/csrc/api/include/torch/python.h:12, from /usr/lib/python3/dist-packages/torch/include/torch/extension.h:6, from /home/ubuntu/.local/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp:5: /usr/lib/python3/dist-packages/torch/include/torch/csrc/utils/pybind.h:7:10: fatal error: pybind11/pybind11.h: No such file or directory 7 | #include <pybind11/pybind11.h> | ^~~~~~~~~~~~~~~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/usr/lib/python3/dist-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build subprocess.run( File "/usr/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "run_clm.py", line 485, in