llm-foundry error 'Getting requirements to build wheel'... is the docker image okay??

trafficstars

Hi all,

Upon the installation step of setup, I am getting an error 'Getting requirements to build wheel'

I am using the mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04 docker image as recommended. It appears that it can't get the requirements?

To replicate: vast.ai 4xA6000 docker: mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04 Ubuntu 20.04.6 LTS (GNU/Linux 5.15.0-71-generic x86_64) sudo apt update -y git clone repo cd llm-foundry pip install -e ".[gpu]"

Running setup.py with python from CL gives the same error. Any advice?

Full output:

`Obtaining file:///root/llm-foundry Installing build dependencies ... done Checking if build backend supports build_editable ... done Getting requirements to build editable ... done Installing backend dependencies ... done Preparing editable metadata (pyproject.toml) ... done Collecting xentropy-cuda-lib@ git+https://github.com/HazyResearch/[email protected]#subdirectory=csrc/xentropy (from llm-foundry==0.1.0) Cloning https://github.com/HazyResearch/flash-attention.git (to revision v0.2.8) to /tmp/pip-install-g3gxu5nw/xentropy-cuda-lib_4b7f2cb3f8074e3c9cedcc0980ea12ea Running command git clone --filter=blob:none --quiet https://github.com/HazyResearch/flash-attention.git /tmp/pip-install-g3gxu5nw/xentropy-cuda-lib_4b7f2cb3f8074e3c9cedcc0980ea12ea Running command git checkout -q 33e0860c9c5667fded5af674882e731909096a7f Resolved https://github.com/HazyResearch/flash-attention.git to commit 33e0860c9c5667fded5af674882e731909096a7f Running command git submodule update --init --recursive -q Installing build dependencies ... done Getting requirements to build wheel ... error error: subprocess-exited-with-error

_ Getting requirements to build wheel did not run successfully. _ exit code: 1 __> [17 lines of output] Traceback (most recent call last): File "/root/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/root/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/root/llm-foundry/llmfoundry-venv/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-g8ihincy/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) File "/tmp/pip-build-env-g8ihincy/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-g8ihincy/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 487, in run_setup super(_BuildMetaLegacyBackend, File "/tmp/pip-build-env-g8ihincy/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in run_setup exec(code, locals()) File "", line 2, in ModuleNotFoundError: No module named 'torch' [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

_ Getting requirements to build wheel did not run successfully. _ exit code: 1 __> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.`

May 16 '23 18:05 jewbot

if you are doing this within the virtual env, then make sure that while you create the venv you copy over all python packages from the system python. Torch is installed in system python but probably not in your new virtualenv.

May 17 '23 08:05 nikaashpuri

Thank you for the reply. I tested it both in the venv and on root, to the same result. I also checked my -v for the packages and it seems like I have torch==1.13 in the venv as well.

May 17 '23 10:05 jewbot

could you please share the Dockerfile once?

May 17 '23 14:05 nikaashpuri

Following up here @jewbot, if it helps this is the Dockerfile we use to build the mosaicml/pytorch:1.13.1_cu117-python3.10-ubuntu20.04 Docker image.

https://github.com/mosaicml/composer/blob/v0.14.1/docker/Dockerfile

Our internal workflow is exactly:

pull Docker image
run pip install -e .[gpu]
run composer train/train.py ...

and we run this on many different clouds (AWS, Oracle, GCP, on-premise) so if there is something going wrong with the process, I would suspect something about the system environment on vast.ai, or the CUDA drivers, or the way Docker is being run.

May 18 '23 20:05 abhi-mosaic

For me the problem was loading the venv as described in these installation instructions.

May 22 '23 03:05 danielclough

Hi, I needed to update the version of setuptools to 67.8.0 and it completed tghe setup script

May 22 '23 18:05 jewbot

@abhi-mosaic thank you!

May 22 '23 18:05 jewbot

llm-foundry llm-foundry copied to clipboard

error 'Getting requirements to build wheel'... is the docker image okay??

llm-foundry
llm-foundry copied to clipboard