torchtune
torchtune copied to clipboard
lm harness distributed evaluation?
I am trying to eval the finetuned model 70B with torch run and getting error
Here is my config file
model:
_component_: torchtune.models.llama3.lora_llama3_70b
lora_attn_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj']
apply_lora_to_mlp: True
apply_lora_to_output: True
lora_rank: 256
lora_alpha: 512
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
path: /tmp/Meta-Llama-3-70B-Instruct/original/tokenizer.model
checkpointer:
_component_: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: /tmp/Meta-Llama-3-70B-Instruct
checkpoint_files: [
model-00001-of-00030.safetensors,
model-00002-of-00030.safetensors,
model-00003-of-00030.safetensors,
model-00004-of-00030.safetensors,
model-00005-of-00030.safetensors,
model-00006-of-00030.safetensors,
model-00007-of-00030.safetensors,
model-00008-of-00030.safetensors,
model-00009-of-00030.safetensors,
model-00010-of-00030.safetensors,
model-00011-of-00030.safetensors,
model-00012-of-00030.safetensors,
model-00013-of-00030.safetensors,
model-00014-of-00030.safetensors,
model-00015-of-00030.safetensors,
model-00016-of-00030.safetensors,
model-00017-of-00030.safetensors,
model-00018-of-00030.safetensors,
model-00019-of-00030.safetensors,
model-00020-of-00030.safetensors,
model-00021-of-00030.safetensors,
model-00022-of-00030.safetensors,
model-00023-of-00030.safetensors,
model-00024-of-00030.safetensors,
model-00025-of-00030.safetensors,
model-00026-of-00030.safetensors,
model-00027-of-00030.safetensors,
model-00028-of-00030.safetensors,
model-00029-of-00030.safetensors,
model-00030-of-00030.safetensors,
]
recipe_checkpoint: null
output_dir: /tmp/Meta-Llama-3-70B-Instruct
model_type: LLAMA3
resume_from_checkpoint: False
# Dataset and Sampler
dataset:
_component_: torchtune.datasets.alpaca_dataset
source: personal_data/data
train_on_input: False
max_seq_len: 8000
seed: 42
shuffle: True
batch_size: 10
# Optimizer and Scheduler
optimizer:
_component_: torch.optim.AdamW
weight_decay: 0.01
lr: 2e-4
lr_scheduler:
_component_: torchtune.modules.get_cosine_schedule_with_warmup
num_warmup_steps: 100
loss:
_component_: torch.nn.CrossEntropyLoss
# Training
epochs: 10
max_steps_per_epoch: null
gradient_accumulation_steps: 32
compile: False
# Logging
output_dir: /tmp/lora_finetune_output
metric_logger:
_component_: torchtune.utils.metric_logging.WandBLogger
project: torchtune
log_every_n_steps: 1
log_peak_memory_stats: False
# Environment
device: cuda
dtype: bf16
enable_activation_checkpointing: True
when running with this command
tune run eleuther_eval --config evalconfig.yml
getting this error
""" File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/parameter.py", line 59, in deepcopy result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/_device.py", line 78, in torch_function return func(*args, **kwargs) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.00 MiB. GPU"""
when tried with
tune run --nproc_per_node 8 eleuther_eval --config evalconfig.yml
it's giving another error
tune run: error: Recipe eleuther_eval does not support distributed training.Please run without torchrun commands.
How to evaluate large models with torchtune?
This is something we're working closely with the EleutherAI team on providing soon. For now, if you have enough RAM (and patience) you can try running on CPU - this will likely take a looooong time. You can also try using the accelerate library for now by following the instructions here: https://github.com/EleutherAI/lm-evaluation-harness#multi-gpu-evaluation-with-hugging-face-accelerate.
Stay tuned for a torchtune native multi-GPU evaluation feature soon!
Awesome, and thank you for the reply! I am excited about the new feature, but in the meantime, I want to try the native LM harness. However, to do that, I need to convert TorchTune weights into HF weights. I am having issues with the conversion for the 70B model, so I have opened another issue for that. Please take a look when you have a chance. https://github.com/pytorch/torchtune/issues/922
I am a heavy user of Axolotl and TRL but am now switching to TorchTune. I anticipate encountering some bugs during this transition, so I will be opening issues as I come across them. :) Additionally, I would be happy to contribute in any way that I can.
An approach was proposed for multi-gpu eval via EleutherAI in #951