stable-diffusion-webui Fixed proper dataset shuffling

Continuation of #3728 Turns out I wasn't hallucinating and torch.randperm was giving a fixed list as training continued. ([1, 93, ...]) This doesn't happen every time - I am suspecting that this is caused on full VRAM, or conflicting with multiprocess such as running image previews mid-training. So I swapped the function with a numpy one and it's working smoothly, even working properly when torch.randperm fails to become random.

The commit message should be fixed - this fix also applies to hypernetworks.

Oct 27 '22 17:10 FlameLaw

do your image process cycle exactly matches with dataset length? also using fixed seed generation and settings? I agree that we need independent rng for shuffling, since its being fixed when we create image preview.

Oct 27 '22 17:10 aria1th

I found the error where dataset length and image generation were equal. But the random fail also happened when it was not equal. Weird stuff. Also used fixed seed and settings.

Oct 27 '22 17:10 ghost

It seems to work, at least for me. I modified the code during training and after a thousand steps it generated more usable images. Thanks, and I hope it will be in the master as soon as possible.

Oct 27 '22 21:10 mykeehu

Can you use new Random object from python random, with fixed seed generation, then permutate from it? We'll need to make reproducable results, i.e. fix seeds used for result to reproduce same result.

Oct 28 '22 13:10 aria1th

simply

from random import Random, randrange
# in __init__
self.seed = randrange(1<<32)
self.random = Random(self.seed) # initialize as None

# in shuffle

self.indexes = np.array(self.random.sample(range(self.dataset_length), self.dataset_length))

# add method to fix in future

def fix_seed(seed = None):
    self.seed = seed if seed is not None else randrange(1<<32)
    self.random = Random(self.seed)

Oct 28 '22 13:10 aria1th

This is more of a torch problem, not an algorithm change. I have seen the problem with 100 training images and saving image per 100 steps or 107 steps on a RTX 3070, with logging each step to csv. I don't think this can be reproduced with simple random objects.

Oct 28 '22 13:10 ghost

No, I mean it would be better to have random information, and unique rng that is only used for dataset shuffling. In this way, we can reproduce training in same dataset + same rate + same setup (well not with dropout...), which might help research in future.

Oct 28 '22 13:10 aria1th

Can't think of a way to make an endless random permutation with seeds, so if you happen to make one, you'll have to set a fixed amount of training steps. Anyway that's unrelated to what this PR is trying to solve

Oct 28 '22 14:10 ghost