Dan Bochman issues

Results 4 issues of


                                            Dan Bochman

Hydra outputs moved to checkpoints/experiment_name dir

Change behaviour of Hydra to output logs to user's checkpoints/experiment_name directory Current behaviour spams output directories with unique timestamps for every run This solves the bug where datasets such as...

size/L

Any specific reason sampling is not in FP16?

During training the forward method casts to FP16 but during sampling no ```python @torch.no_grad() @cast_torch_tensor def sample(self, *args, **kwargs): self.print_untrained_unets() if not self.is_main: kwargs["use_tqdm"] = False output = self.imagen.sample(*args, device=self.device,...

Always getting NaNs in long training

I've been experimenting with the LION optimizer in your other (great) Imagen repository. I can share my anecdotal experience and combinations: - Models of different sizes 0.2B, 0.7B and 1B...

GradientAccumulationPlugin(sync_with_dataloader=True) default behavior is bad for train/val dataloader setup

### System Info ```Shell - `Accelerate` version: 0.33.0 - Platform: Linux-5.15.0-1067-azure-x86_64-with-glibc2.31 - `accelerate` bash location: /home/azureuser/miniconda3/envs/pytorch/bin/accelerate - Python version: 3.10.13 - Numpy version: 1.26.4 - PyTorch version (GPU?): 2.4.0+cu118 (True)...