axolotl Running Example on Free T4 GPU through Google Colab

Running Example on Free T4 GPU through Google Colab

Open hammad93 opened this issue 5 months ago • 8 comments

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

The setup or first 2 cells in the notebook can install the environment. This set of installation code should run all subsequent code to successfully recreate the training and inference steps of LLM's through Axolotl. https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/colab-notebooks/colab-axolotl-example.ipynb

Current behaviour

The first cell is successful,

import torch
# Check so there is a gpu available, a T4(free tier) is enough to run this notebook
assert (torch.cuda.is_available()==True)

The second cell never completes and even if we try to change some of the code, the accelerate command doesn't work. It hangs at flash attention.

!pip install -e git+https://github.com/axolotl-ai-cloud/axolotl#egg=axolotl
!pip install flash-attn=="2.5.0"
!pip install deepspeed=="0.13.1"!pip install mlflow=="2.13.0"

Here's the tail of the installation output from the cell,

:
:
  Attempting uninstall: gcsfs
    Found existing installation: gcsfs 2024.6.1
    Uninstalling gcsfs-2024.6.1:
      Successfully uninstalled gcsfs-2024.6.1
  Running setup.py develop for axolotl
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albumentations 1.4.14 requires pydantic>=2.7.0, but you have pydantic 2.6.3 which is incompatible.
cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.
ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 17.0.0 which is incompatible.
Successfully installed accelerate-0.34.2 addict-2.4.0 aiobotocore-2.14.0 aiofiles-23.2.1 aioitertools-0.12.0 art-6.2 autoawq-0.2.5 autoawq-kernels-0.0.6 axolotl-0.4.1 bitsandbytes-0.43.3 botocore-1.35.7 colorama-0.4.6 coloredlogs-15.0.1 datasets-2.20.0 dill-0.3.8 docker-pycreds-0.4.0 evaluate-0.4.1 fastapi-0.114.0 ffmpy-0.4.0 fire-0.6.0 fschat-0.2.36 fsspec-2024.5.0 gcsfs-2024.5.0 gitdb-4.0.11 gitpython-3.1.43 gradio-3.50.2 gradio-client-0.6.1 h11-0.14.0 hf_transfer-0.1.8 httpcore-1.0.5 httpx-0.27.2 humanfriendly-10.0 jmespath-1.0.1 latex2mathml-3.77.0 liger-kernel-0.2.1 markdown2-2.5.0 multiprocess-0.70.16 nh3-0.2.18 optimum-1.16.2 orjson-3.10.7 packaging-23.2 peft-0.12.0 pyarrow-17.0.0 pydantic-2.6.3 pydantic-core-2.16.3 pydub-0.25.1 pynvml-11.5.3 python-dotenv-1.0.1 python-multipart-0.0.9 responses-0.18.0 s3fs-2024.5.0 scikit-learn-1.4.2 semantic-version-2.10.0 sentry-sdk-2.13.0 setproctitle-1.3.3 shortuuid-1.0.13 shtab-1.7.1 smmap-5.0.1 starlette-0.38.5 svgwrite-1.4.3 tiktoken-0.7.0 triton-3.0.0 trl-0.9.6 tyro-0.8.10 uvicorn-0.30.6 wandb-0.17.9 wavedrom-2.0.3.post3 websockets-11.0.3 xformers-0.0.27.post2 xxhash-3.5.0 zstandard-0.22.0
Collecting flash-attn==2.5.0
  Downloading flash_attn-2.5.0.tar.gz (2.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 38.5 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.5.0) (2.4.0+cu121)
Requirement already satisfied: einops in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.5.0) (0.8.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.5.0) (23.2)
Collecting ninja (from flash-attn==2.5.0)
  Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (3.15.4)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (4.12.2)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (1.13.2)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (3.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (3.1.4)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (2024.5.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->flash-attn==2.5.0) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->flash-attn==2.5.0) (1.3.0)
Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 kB 23.2 MB/s eta 0:00:00
Building wheels for collected packages: flash-attn

Steps to reproduce

Open https://colab.research.google.com/ and login if needed
Click open Github, type in axolotl-ai-cloud for organization and select axolotl for repository
Select the example notebook
Run the cells in the notebook for installation

Config yaml

base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: mhenrichsen/alpaca_2k_test
    type: alpaca
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

Possible solution

This might be because the T4 GPU is not supported by this library. Here's the documentation about which GPU's are supported, https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#nvidia-cuda-support

I used the instructions in the README instead to install and train TinyLLAMA using Axolotl instead. I also changed the config to set the flash attention layer to false. This still installs the flash-attn somehow but we get RuntimeError: FlashAttention only supports Ampere GPUs or newer error at runtime instead.

This cell will install it,

!git clone https://github.com/axolotl-ai-cloud/axolotl
%cd axolotl
!pip3 install packaging ninja
!pip3 install -e '.[flash-attn,deepspeed]'

Then in the config.yaml,

flash_attention: false

This does run on the free T4 GPU but still takes hours to finish and may need a different config.

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Sep 08 '24 17:09 hammad93

axolotl axolotl copied to clipboard

Running Example on Free T4 GPU through Google Colab

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

axolotl
axolotl copied to clipboard