DeepSpeed
DeepSpeed copied to clipboard
lr scheduler defined in config cannot be overwritten by lr scheduler defined in code and pass to `deepspeed.initialize` [BUG]
Describe the issue note: this is not a bug but a inconsistent design.
In the current implementation of the learning rate (LR) scheduler configuration in DeepSpeed, the LR scheduler is always initialized from the configuration file if it is defined there, regardless of whether a scheduler is provided programmatically. This behavior leads to inconsistency with the optimizer, which can be overwritten programmatically even if it is defined in the configuration file.
Here is the current implementation of the LR scheduler configuration:
def _configure_lr_scheduler(self, client_lr_scheduler):
# First check for scheduler in json configuration
lr_scheduler = self._scheduler_from_config(self.optimizer)
if lr_scheduler:
log_dist(f"DeepSpeed using configured LR scheduler = {self.scheduler_name()}", ranks=[0])
self.lr_scheduler = lr_scheduler
else:
if isinstance(client_lr_scheduler, Callable):
log_dist('DeepSpeed using client callable to create LR scheduler', ranks=[0])
self.lr_scheduler = client_lr_scheduler(self.basic_optimizer)
else:
log_dist('DeepSpeed using client LR scheduler', ranks=[0])
self.lr_scheduler = client_lr_scheduler
log_dist(f'DeepSpeed LR Scheduler = {self.lr_scheduler}', ranks=[0])
Expected behavior We should be able to overwrite the lr scheduler defined in config. Ideally would prefer something like:
def _configure_lr_scheduler(self, client_lr_scheduler):
# First check for scheduler in json configuration
if client_lr_scheduler:
if isinstance(client_lr_scheduler, Callable):
log_dist('DeepSpeed using client callable to create LR scheduler', ranks=[0])
self.lr_scheduler = client_lr_scheduler(self.basic_optimizer)
else:
log_dist('DeepSpeed using client LR scheduler', ranks=[0])
self.lr_scheduler = client_lr_scheduler
else:
lr_scheduler = self._scheduler_from_config(self.optimizer)
log_dist(f"DeepSpeed using configured LR scheduler = {self.scheduler_name()}", ranks=[0])
self.lr_scheduler = lr_scheduler
log_dist(f'DeepSpeed LR Scheduler = {self.lr_scheduler}', ranks=[0])
Why does the current design enforce the initialization of the LR scheduler from the configuration file if it is defined there, while allowing the optimizer to be overwritten programmatically?
@xiyang-aads-lilly, this is a good catch. Your proposed solution looks reasonable. Are you able to provide a PR? Thanks!
@xiyang-aads-lilly, this is a good catch. Your proposed solution looks reasonable. Are you able to provide a PR? Thanks!
Sure. I will create PR next week.
Closing as this issue was resolved by #5846. Please feel free to reopen when needed.