PiPPy icon indicating copy to clipboard operation
PiPPy copied to clipboard

Failed to run fine-tuning (freezing some layers) of hf model with pippy

Open hakob-petro opened this issue 2 years ago • 0 comments

I'm trying to do fine-tuning of language-modeling, freezing some first layers of RoBERTa. The code is pretty similar to run_mlm.py example from https://github.com/pytorch/PiPPy/tree/main/examples/hf/language-modeling. But I get error in step of synchronization after first forward pass. See details below:

Env

Ubuntu 20.04.4 LTS
Python 3.8.10
transformers 4.32.0
torch 2.0.1+cu117

The differences are:

  1. Using pytorch native training instead of pippy.hf.PiPPyTrainer or accelerate
  2. Manually splitting into submodules using anotate_states, so that all stages get trainable parameters (see code below)
  3. Other minor modifications to run complete pippy variant (manually reading from environment variables info provided by torchrun, instantiating optimizer and lr scheduler, deleting calling backward and others)
layers = (
        "roberta.encoder.layer.8",
        "roberta.encoder.layer.9",
        "roberta.encoder.layer.10",
        "roberta.encoder.layer.11",
        "lm_head"
)
for name, param in model.named_parameters():
    if name.startswith(layers):
        param.requires_grad = True
    else:
        param.requires_grad = False

annotate_split_points(
    model,
    {
        f'roberta.encoder.layer.9': PipeSplitWrapper.SplitPoint.BEGINNING,
        f'roberta.encoder.layer.10': PipeSplitWrapper.SplitPoint.BEGINNING,
        f'roberta.encoder.layer.11': PipeSplitWrapper.SplitPoint.BEGINNING
    })

Problem [see error message below] But, unfortunately, after freezing the layers, I encounter a problem during the synchronization of gradients (in method _sync_replicated_params). If I remove the freeze, the code is successfully launched and training is underway.

Traceback (most recent call last):
  File "lm_no_trainer_pippy_ddp.py", line 805, in <module>
    run_pippy(run_master, args)
  File "/home/ubuntu/envs/pippy/lib/python3.8/site-packages/torchpippy-0.1.1-py3.8.egg/pippy/utils.py", line 155, in run_pippy
    run_worker(args.rank, run_func, args, *extra_args)
  File "/home/ubuntu/envs/pippy/lib/python3.8/site-packages/torchpippy-0.1.1-py3.8.egg/pippy/utils.py", line 270, in run_worker
    run_func(my_pp_ranks, args, *extra_args)
  File "lm_no_trainer_pippy_ddp.py", line 718, in run_master
    outputs = pipe_driver(**batch)
  File "/home/ubuntu/envs/pippy/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/envs/pippy/lib/python3.8/site-packages/torchpippy-0.1.1-py3.8.egg/pippy/PipelineDriver.py", line 2185, in forward
    self._sync_replicated_params()
  File "/home/ubuntu/envs/pippy/lib/python3.8/site-packages/torchpippy-0.1.1-py3.8.egg/pippy/PipelineDriver.py", line 1602, in _sync_replicated_params
    synced_value = torch.sum(torch.stack(grad_values), dim=0)
TypeError: expected Tensor as element 0 in argument 0, but got NoneType

So, if someone can explain what I'm doing wrong or show an example of how to do fine-tunning correctly, I will be very grateful.

Unfortunately, I didn't find any examples with freezing layers in the repository, so I think it will be useful to add such examples too.

hakob-petro avatar Sep 05 '23 14:09 hakob-petro