FuseAI
FuseAI copied to clipboard
Out of Memory Issue with OpenLLaMA-7B in Default FuseLLM Setting on A100 (80G)
Description
I am currently attempting to reproduce the results of your excellent work, FuseLLM, following the doc (https://github.com/18907305772/FuseAI/blob/main/FuseLLM/README.md). During these operations, I am encountering an Out of Memory (OOM) issue.
It is very weird that I am encountering an OOM issue given the situation where I strictly follow and use the command in the document. I also tried to use ZeRO3 for optimizing memory consumption following https://github.com/18907305772/FuseAI/issues/10. But it does not help. From my naive assumption, it may be due to the reason for different packages versions and thus have a different memory optimization result. Would you mind me asking for detailed environment information for further attempts?
For your convenience, below is my relevant environment information.
Environment
Hardware: 8 x Nvidia A100 (80G) GPUs Python version: 3.9 CUDA version: 11.8
Package Version absl-py 2.1.0 accelerate 0.24.1 aiohttp 3.9.5 aiosignal 1.3.1 annotated-types 0.7.0 async-timeout 4.0.3 attrs 23.2.0 audioread 3.0.1 certifi 2024.7.4 cffi 1.16.0 charset-normalizer 3.3.2 datasets 2.14.7 decorator 5.1.1 deepspeed 0.14.4 dill 0.3.7 editdistance 0.6.2 einops 0.8.0 filelock 3.15.4 flash_attn 0.2.8 frozenlist 1.4.1 fsspec 2023.10.0 grpcio 1.65.1 hjson 3.1.0 huggingface-hub 0.17.3 idna 3.7 importlib_metadata 8.2.0 Jinja2 3.1.4 joblib 1.4.2 lazy_loader 0.4 librosa 0.10.2.post1 llvmlite 0.43.0 Markdown 3.6 MarkupSafe 2.1.5 mpmath 1.3.0 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.15 networkx 3.2.1 ninja 1.11.1.1 numba 0.60.0 numpy 2.0.1 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.555.43 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 packaging 24.1 pandas 2.2.2 peft 0.12.0 pip 24.0 platformdirs 4.2.2 pooch 1.8.2 protobuf 4.25.4 psutil 6.0.0 py-cpuinfo 9.0.0 pyarrow 17.0.0 pyarrow-hotfix 0.6 pycparser 2.22 pydantic 2.8.2 pydantic_core 2.20.1 python-dateutil 2.9.0.post0 pytz 2024.1 PyYAML 6.0.1 regex 2024.7.24 requests 2.32.3 safetensors 0.4.3 scikit-learn 1.5.1 scipy 1.13.1 sentencepiece 0.2.0 setuptools 69.5.1 six 1.16.0 soundfile 0.12.1 soxr 0.4.0 sympy 1.13.1 tensorboard 2.17.0 tensorboard-data-server 0.7.2 threadpoolctl 3.5.0 tokenizers 0.14.1 torch 2.4.0 tqdm 4.66.4 transformers 4.35.1 triton 3.0.0 typing_extensions 4.12.2 tzdata 2024.1 urllib3 2.2.2 Werkzeug 3.0.3 wheel 0.43.0 xxhash 3.4.1 yarl 1.9.4 zipp 3.19.2
Thank you for any assistance or suggestions you might provide.