LESS
LESS copied to clipboard
At step1, single GPU works while multiple GPUs get stuck.
When I follow the same process as step 1. It's OK for me to set the nproc_per_node to 1 in base_training_args.sh
(and export CUDA_VISIBLE_DEVICES to my custom device). However when I set it to a value larger than 1 (and set CUDA_VISIBLE_DEVICES at the same time), it always gets stuck when it comes to this place:
[train set] examples: 13533; # avg tokens: 370.9773254394531
[train set] examples: 13533; # avg completion tokens: 105.39820861816406
/mnt/workspace/anaconda3/envs/LESS/lib/python3.9/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False)
warnings.warn(
[INFO|trainer.py:568] 2024-06-28 22:31:18,153 >> Using auto half precision backend
Also, to avoid another issue, I add base_training_args="$base_training_args --fsdp 'full_shard auto_wrap' --fsdp_config llama_finetune"
before setting the training_args
.
The experiment was done on 4 H100. The python version is 3.9.0 and the whole pip list is below:
accelerate 0.28.0
aiohttp 3.9.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.40.0
certifi 2024.6.2
charset-normalizer 3.3.2
click 8.1.7
datasets 2.20.0
dill 0.3.8
docker-pycreds 0.4.0
fast_jl 0.1.3
filelock 3.15.4
frozenlist 1.4.1
fsspec 2024.5.0
gitdb 4.0.11
GitPython 3.1.43
huggingface-hub 0.23.4
idna 3.7
Jinja2 3.1.4
less 0.1 /mnt/workspace/LESS
MarkupSafe 2.1.5
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
networkx 3.2.1
numpy 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.5.40
nvidia-nvtx-cu12 12.1.105
packaging 24.1
pandas 2.2.2
peft 0.7.1
pip 24.0
platformdirs 4.2.2
protobuf 5.27.2
psutil 6.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
python-dateutil 2.9.0.post0
pytz 2024.1
PyYAML 6.0.1
regex 2024.5.15
requests 2.32.3
safetensors 0.4.3
scipy 1.13.1
sentry-sdk 2.7.1
setproctitle 1.3.3
setuptools 69.5.1
six 1.16.0
smmap 5.0.1
sympy 1.12.1
tokenizers 0.15.2
torch 2.1.2
tqdm 4.66.4
traker 0.1.3
transformers 4.36.2
triton 2.1.0
typing_extensions 4.12.2
tzdata 2024.1
urllib3 2.2.2
wandb 0.17.3
wheel 0.43.0
xxhash 3.4.1
yarl 1.9.4
What should I do to make it run on multi GPUs? By the way it works correctly on a 2 A100 sever though the environment may not be totally the same.