InfiniTransformer
InfiniTransformer copied to clipboard
What is the min GPU memory required to fine-tune the model?
First of all, thank you very much for your work.
I try to train the model Gemma-2B 32K seq len with 2K segment size
on a single A6000Ada 48G
But even if I adjust the parameters in train.gemma.infini.noclm.sh
like the following, it still shows that the GPU memory is exceeded.
Is this normal?
accelerate launch --mixed_precision='bf16' \
train.gemma.infini.noclm.py \
--model_name_or_path='google/gemma-2b' \
--segment_length=2 \
--block_size=32 \
--dataset_name='wikitext' \
--dataset_config_name='wikitext-2-raw-v1' \
--per_device_train_batch_size=1 \
--per_device_eval_batch_size=1 \
--weight_decay=1.0 \
--output_dir='./models/gemma-2b-infini-noclm-wikitext' \
--checkpointing_steps=10 \
--num_train_epochs=1 \
--learning_rate=5e-5 \
--seed=42 \
--low_cpu_mem_usage \
--report_to='wandb' \
--preprocessing_num_workers=64 \
--with_tracking \