accelerate
accelerate copied to clipboard
_prepare_deepspeed fail to capture correct kwargs with DummyOptim or DummyScheduler when calling prepare() multiple times
System Info
accelerate==0.34.2
python==3.10
deepspeed==0.15.1
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
) - [X] My own task or dataset (give details below)
Reproduction
Hey, since I may want to prepare only certain items depending on my training arguments (suppose I don't want to prepare scheduler this time), I decided to order them in a dict and call prepare function multiple time as the number of items are not fixed. After that, I use setattr to re-allocate them to their namespace. It works perfectly util I want to change my code to support deepspeed plugin.
# handle scheduler manually
accelerator_to_prepare = OrderedDict(
[
("optimizer", self.optimizer),
("train_dataloader", self.train_dataloader),
("valid_dataloader", self.valid_dataloader),
("lr_scheduler", self.lr_scheduler),
("model", self.model),
]
)
if self.use_gan:
accelerator_to_prepare["discriminator"] = self.discriminator
for k, v in accelerator_to_prepare.items():
self.print_global_rank_0(f"start prepare {k}")
setattr(self, k, self.accelerator.prepare(v))
In accelerator's _prepare_deepspeed function, it captures the prepared items and finds the corresponding optimizer and scheduler, then catch the kwargs passed to them to feed in deepspeed config to make all things work. But In my case, I call the accelerate prepare method multiple times, it only captures the last time call, which means the result only contains one item ([model]
in my case). Thus it cannot successfully find out the kwargs needed by the optimizer and scheduler (because they're set to "auto" in deepspeed config). Which make the deepspeed_config_process failed with error.
model = None
optimizer = None
scheduler = None
for obj in result:
if isinstance(obj, torch.nn.Module):
model = obj
elif isinstance(obj, (torch.optim.Optimizer, DummyOptim)):
optimizer = obj
elif (isinstance(obj, (LRScheduler, DummyScheduler))) or (
type(obj).__name__ in deepspeed.runtime.lr_schedules.VALID_LR_SCHEDULES
):
scheduler = obj
Expected behavior
I think accelerate should handle this scenario.