llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

could you give an elaborated steps about how to run llm-foundry on AMD mi250 devices

Open Alice1069 opened this issue 1 year ago • 1 comments

could run the llm-foundry on AMD 4xMi250 machine

Steps to reproduce the behavior:

  1. follow latest instructions from: https://github.com/ROCm/flash-attention/tree/flash_attention_for_rocm start from docker image: rocm/pytorch:rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1 export GPU_ARCHS="gfx90a" export PYTHON_SITE_PACKAGES=$(python -c 'import site; print(site.getsitepackages()[0])') patch "${PYTHON_SITE_PACKAGES}/torch/utils/hipify/hipify_python.py" hipify_patch.patch pip install . verified by PYTHONPATH=$PWD python benchmarks/benchmark_flash_attention.py "pip list" shows "flash-attn 2.0.4"

  2. get llm-foundry v0.7 code modify setup.py

  • 'torch>=2.2.1,<2.3',
  • 'torch>=2.0,<2.0.2',
  1. pip3 install --upgrade pip
  2. pip install -e .
  3. command to run : python data_prep/convert_dataset_hf.py
    --dataset c4 --data_subset en
    --out_root my-copy-c4 --splits train_small val_small
    --concat_tokens 2048 --tokenizer EleutherAI/gpt-neox-20b --eos_text '<|endoftext|>'

composer train/train.py train/yamls/pretrain/mpt-1b.yaml data_local=my-copy-c4 train_loader.dataset.split=train_small eval_loader.dataset.split=val_small max_duration=10ba eval_interval=0 loss_fn=torch_crossentropy save_folder=mpt-1b

  1. it said lack of lotary_emb
  2. pip install lotary_emb
  3. re run command, it said lack of libcudart.11.0
  4. export LD_LIBRARY_PATH to include libudart
  5. re run command , it said lack of libtorch_cuda.so

could you give me a detailed version of hwo to run llm-foundry on AMD mi250, i read through the 2 blogs about AMD, but not get the hint. any version of code is ok. Thank you!

Alice1069 avatar May 27 '24 07:05 Alice1069