InfiniTransformer What is the min GPU memory required to fine-tune the model?

What is the min GPU memory required to fine-tune the model?

Open Ozawa333 opened this issue 9 months ago • 0 comments

First of all, thank you very much for your work.

I try to train the model Gemma-2B 32K seq len with 2K segment size on a single A6000Ada 48G But even if I adjust the parameters in train.gemma.infini.noclm.sh like the following, it still shows that the GPU memory is exceeded. Is this normal?

accelerate launch --mixed_precision='bf16' \
    train.gemma.infini.noclm.py \
    --model_name_or_path='google/gemma-2b' \
    --segment_length=2 \
    --block_size=32 \
    --dataset_name='wikitext' \
    --dataset_config_name='wikitext-2-raw-v1' \
    --per_device_train_batch_size=1 \
    --per_device_eval_batch_size=1 \
    --weight_decay=1.0 \
    --output_dir='./models/gemma-2b-infini-noclm-wikitext' \
    --checkpointing_steps=10 \
    --num_train_epochs=1 \
    --learning_rate=5e-5 \
    --seed=42 \
    --low_cpu_mem_usage \
    --report_to='wandb' \
    --preprocessing_num_workers=64 \
    --with_tracking \

May 10 '24 07:05 Ozawa333

InfiniTransformer InfiniTransformer copied to clipboard

What is the min GPU memory required to fine-tune the model?

InfiniTransformer
InfiniTransformer copied to clipboard