litgpt Pretraining example from readme fails in Colab

Running the pretraining example from GitHub fails when run in Google Colab.

!pip install 'litgpt[all]'

!mkdir -p custom_texts
!curl https://www.gutenberg.org/cache/epub/24440/pg24440.txt --output custom_texts/book1.txt
!curl https://www.gutenberg.org/cache/epub/26393/pg26393.txt --output custom_texts/book2.txt

# 1) Download a tokenizer
!litgpt download \
  --repo_id EleutherAI/pythia-160m \
  --tokenizer_only True

# 2) Pretrain the model
!litgpt pretrain \
  --model_name pythia-160m \
  --tokenizer_dir checkpoints/EleutherAI/pythia-160m \
  --data TextFiles \
  --data.train_data_path "custom_texts/" \
  --train.max_tokens 10_000_000 \
  --out_dir out/custom-model

# 3) Chat with the model
!litgpt chat \
  --checkpoint_dir out/custom-model/final

@awaelchli

May 08 '24 14:05 AndisDraguns

Hi there, I was just running this code on a A10G in a Lightning Studio, and it worked fine. Do you have the error message you got or more explanation on how or why it failed? One hypothesis is that the memory in Colab may not be sufficient depending on the GPU. However, based on my test, it should only require 8.62 GB on a GPU that supports bfloat16.

May 08 '24 15:05 rasbt

@rasbt we created the issue together at iclr. I will look into it. This is in colab, not studios.

May 08 '24 15:05 awaelchli

There is a cache path resolution issue in LitData, it needs to be fixed there. Thanks for the repro. https://github.com/Lightning-AI/litdata/issues/126

May 08 '24 21:05 awaelchli