consistency_models icon indicating copy to clipboard operation
consistency_models copied to clipboard

Why does the training not stop?

Open ZhaohanWang0217 opened this issue 1 year ago • 1 comments

Reduce total_training_steps.Then,Why does the training not stop when the steps are reduced

ZhaohanWang0217 avatar Nov 10 '23 08:11 ZhaohanWang0217

I found the reason that you wonder. In the run_loop function,

def run_loop(self):
    saved = False
    while (
        not self.lr_anneal_steps
        or self.step < self.lr_anneal_steps
        or self.global_step < self.total_training_steps
    ):
        batch, cond = next(self.data)
        self.run_step(batch, cond)
        saved = False
        if (
            self.global_step
            and self.save_interval != -1
            and self.global_step % self.save_interval == 0
        ):
            self.save()
            saved = True
            th.cuda.empty_cache()
            # Run for a finite amount of time in integration tests.
            if os.environ.get("DIFFUSION_TRAINING_TEST", "") and self.step > 0:
                return
        if self.global_step % self.log_interval == 0:
            logger.dumpkvs()

The condition not self.lr_anneal_steps always evaluates to True if lr_anneal_steps is left at its default value of 0. You can temporarily fix the issue by removing not self.lr_anneal_steps or self.step < self.lr_anneal_steps.

subminu avatar Sep 24 '24 08:09 subminu