torchtune lm harness distributed evaluation?

I am trying to eval the finetuned model 70B with torch run and getting error

Here is my config file

model:
  _component_: torchtune.models.llama3.lora_llama3_70b
  lora_attn_modules: ['q_proj', 'k_proj', 'v_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj']
  apply_lora_to_mlp: True
  apply_lora_to_output: True
  lora_rank: 256
  lora_alpha: 512

tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /tmp/Meta-Llama-3-70B-Instruct/original/tokenizer.model

checkpointer:
  _component_: torchtune.utils.FullModelHFCheckpointer
  checkpoint_dir:  /tmp/Meta-Llama-3-70B-Instruct
  checkpoint_files: [
    model-00001-of-00030.safetensors,
    model-00002-of-00030.safetensors,
    model-00003-of-00030.safetensors,
    model-00004-of-00030.safetensors,
    model-00005-of-00030.safetensors,
    model-00006-of-00030.safetensors,
    model-00007-of-00030.safetensors,
    model-00008-of-00030.safetensors,
    model-00009-of-00030.safetensors,
    model-00010-of-00030.safetensors,
    model-00011-of-00030.safetensors,
    model-00012-of-00030.safetensors,
    model-00013-of-00030.safetensors,
    model-00014-of-00030.safetensors,
    model-00015-of-00030.safetensors,
    model-00016-of-00030.safetensors,
    model-00017-of-00030.safetensors,
    model-00018-of-00030.safetensors,
    model-00019-of-00030.safetensors,
    model-00020-of-00030.safetensors,
    model-00021-of-00030.safetensors,
    model-00022-of-00030.safetensors,
    model-00023-of-00030.safetensors,
    model-00024-of-00030.safetensors,
    model-00025-of-00030.safetensors,
    model-00026-of-00030.safetensors,
    model-00027-of-00030.safetensors,
    model-00028-of-00030.safetensors,
    model-00029-of-00030.safetensors,
    model-00030-of-00030.safetensors,
  ]
  recipe_checkpoint: null
  output_dir: /tmp/Meta-Llama-3-70B-Instruct
  model_type: LLAMA3
resume_from_checkpoint: False

# Dataset and Sampler
dataset:
  _component_: torchtune.datasets.alpaca_dataset
  source: personal_data/data
  train_on_input: False
  max_seq_len: 8000
seed: 42
shuffle: True
batch_size: 10

# Optimizer and Scheduler
optimizer:
  _component_: torch.optim.AdamW
  weight_decay: 0.01
  lr: 2e-4
lr_scheduler:
  _component_: torchtune.modules.get_cosine_schedule_with_warmup
  num_warmup_steps: 100

loss:
  _component_: torch.nn.CrossEntropyLoss

# Training
epochs: 10
max_steps_per_epoch: null
gradient_accumulation_steps: 32
compile: False

# Logging
output_dir: /tmp/lora_finetune_output
metric_logger:
  _component_: torchtune.utils.metric_logging.WandBLogger
  project: torchtune
log_every_n_steps: 1
log_peak_memory_stats: False

# Environment
device: cuda
dtype: bf16
enable_activation_checkpointing: True

when running with this command

tune run eleuther_eval --config evalconfig.yml

getting this error

""" File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/parameter.py", line 59, in deepcopy result = type(self)(self.data.clone(memory_format=torch.preserve_format), self.requires_grad) File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/_device.py", line 78, in torch_function return func(*args, **kwargs) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 14.00 MiB. GPU"""

when tried with

tune run --nproc_per_node 8 eleuther_eval --config evalconfig.yml

it's giving another error

tune run: error: Recipe eleuther_eval does not support distributed training.Please run without torchrun commands.

How to evaluate large models with torchtune?

May 03 '24 09:05 monk1337

This is something we're working closely with the EleutherAI team on providing soon. For now, if you have enough RAM (and patience) you can try running on CPU - this will likely take a looooong time. You can also try using the accelerate library for now by following the instructions here: https://github.com/EleutherAI/lm-evaluation-harness#multi-gpu-evaluation-with-hugging-face-accelerate.

Stay tuned for a torchtune native multi-GPU evaluation feature soon!

May 03 '24 13:05 joecummings

Awesome, and thank you for the reply! I am excited about the new feature, but in the meantime, I want to try the native LM harness. However, to do that, I need to convert TorchTune weights into HF weights. I am having issues with the conversion for the 70B model, so I have opened another issue for that. Please take a look when you have a chance. https://github.com/pytorch/torchtune/issues/922

I am a heavy user of Axolotl and TRL but am now switching to TorchTune. I anticipate encountering some bugs during this transition, so I will be opening issues as I come across them. :) Additionally, I would be happy to contribute in any way that I can.

May 03 '24 13:05 monk1337

An approach was proposed for multi-gpu eval via EleutherAI in #951

Aug 21 '24 17:08 RdoubleA

torchtune torchtune copied to clipboard

lm harness distributed evaluation?

torchtune
torchtune copied to clipboard