axolotl
axolotl copied to clipboard
Running Example on Free T4 GPU through Google Colab
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
The setup or first 2 cells in the notebook can install the environment. This set of installation code should run all subsequent code to successfully recreate the training and inference steps of LLM's through Axolotl. https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/colab-notebooks/colab-axolotl-example.ipynb
Current behaviour
The first cell is successful,
import torch
# Check so there is a gpu available, a T4(free tier) is enough to run this notebook
assert (torch.cuda.is_available()==True)
The second cell never completes and even if we try to change some of the code, the accelerate
command doesn't work. It hangs at flash attention.
!pip install -e git+https://github.com/axolotl-ai-cloud/axolotl#egg=axolotl
!pip install flash-attn=="2.5.0"
!pip install deepspeed=="0.13.1"!pip install mlflow=="2.13.0"
Here's the tail of the installation output from the cell,
:
:
Attempting uninstall: gcsfs
Found existing installation: gcsfs 2024.6.1
Uninstalling gcsfs-2024.6.1:
Successfully uninstalled gcsfs-2024.6.1
Running setup.py develop for axolotl
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
albumentations 1.4.14 requires pydantic>=2.7.0, but you have pydantic 2.6.3 which is incompatible.
cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is incompatible.
ibis-framework 8.0.0 requires pyarrow<16,>=2, but you have pyarrow 17.0.0 which is incompatible.
Successfully installed accelerate-0.34.2 addict-2.4.0 aiobotocore-2.14.0 aiofiles-23.2.1 aioitertools-0.12.0 art-6.2 autoawq-0.2.5 autoawq-kernels-0.0.6 axolotl-0.4.1 bitsandbytes-0.43.3 botocore-1.35.7 colorama-0.4.6 coloredlogs-15.0.1 datasets-2.20.0 dill-0.3.8 docker-pycreds-0.4.0 evaluate-0.4.1 fastapi-0.114.0 ffmpy-0.4.0 fire-0.6.0 fschat-0.2.36 fsspec-2024.5.0 gcsfs-2024.5.0 gitdb-4.0.11 gitpython-3.1.43 gradio-3.50.2 gradio-client-0.6.1 h11-0.14.0 hf_transfer-0.1.8 httpcore-1.0.5 httpx-0.27.2 humanfriendly-10.0 jmespath-1.0.1 latex2mathml-3.77.0 liger-kernel-0.2.1 markdown2-2.5.0 multiprocess-0.70.16 nh3-0.2.18 optimum-1.16.2 orjson-3.10.7 packaging-23.2 peft-0.12.0 pyarrow-17.0.0 pydantic-2.6.3 pydantic-core-2.16.3 pydub-0.25.1 pynvml-11.5.3 python-dotenv-1.0.1 python-multipart-0.0.9 responses-0.18.0 s3fs-2024.5.0 scikit-learn-1.4.2 semantic-version-2.10.0 sentry-sdk-2.13.0 setproctitle-1.3.3 shortuuid-1.0.13 shtab-1.7.1 smmap-5.0.1 starlette-0.38.5 svgwrite-1.4.3 tiktoken-0.7.0 triton-3.0.0 trl-0.9.6 tyro-0.8.10 uvicorn-0.30.6 wandb-0.17.9 wavedrom-2.0.3.post3 websockets-11.0.3 xformers-0.0.27.post2 xxhash-3.5.0 zstandard-0.22.0
Collecting flash-attn==2.5.0
Downloading flash_attn-2.5.0.tar.gz (2.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 38.5 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.5.0) (2.4.0+cu121)
Requirement already satisfied: einops in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.5.0) (0.8.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from flash-attn==2.5.0) (23.2)
Collecting ninja (from flash-attn==2.5.0)
Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (3.15.4)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (4.12.2)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (1.13.2)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (3.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (3.1.4)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->flash-attn==2.5.0) (2024.5.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->flash-attn==2.5.0) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->flash-attn==2.5.0) (1.3.0)
Downloading ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 307.2/307.2 kB 23.2 MB/s eta 0:00:00
Building wheels for collected packages: flash-attn
Steps to reproduce
- Open https://colab.research.google.com/ and login if needed
- Click open Github, type in
axolotl-ai-cloud
for organization and selectaxolotl
for repository - Select the example notebook
- Run the cells in the notebook for installation
Config yaml
base_model: TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/qlora-out
adapter: qlora
lora_model_dir:
sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 4
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
Possible solution
This might be because the T4 GPU is not supported by this library. Here's the documentation about which GPU's are supported, https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#nvidia-cuda-support
I used the instructions in the README instead to install and train TinyLLAMA using Axolotl instead. I also changed the config to set the flash attention layer to false. This still installs the flash-attn somehow but we get RuntimeError: FlashAttention only supports Ampere GPUs or newer
error at runtime instead.
This cell will install it,
!git clone https://github.com/axolotl-ai-cloud/axolotl
%cd axolotl
!pip3 install packaging ninja
!pip3 install -e '.[flash-attn,deepspeed]'
Then in the config.yaml,
flash_attention: false
This does run on the free T4 GPU but still takes hours to finish and may need a different config.
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.10
axolotl branch-commit
main
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.