diffusion
diffusion copied to clipboard
Results
22
diffusion issues
Sort by
recently updated
recently updated
newest added
frustrated after training about 1654/ba it corrupted, failed to save the checkpoint, tried two times. Error as follows: > [E ProcessGroupNCCL.cpp:828] [Rank 3] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=39739, OpType=ALLREDUCE,...
I'm trying to execute a training process with `composer run.py --config-path yamls/hydra-yamls --config-name SD-2-base-256.yaml`, after changing the configuration to use a custom data loader. Im getting some generic error AttributeError("'IterableDatasetDict'...