Multi-layer structure hyper network and LR scheduler

Open aria1th opened this issue 3 years ago • 0 comments

Supports previously trained hypernetworks (1x -> 2x -> 1x simple networks)

Tested creating / training hypernetwork with [1, 2, 2, 1] argument. Tested training from existing hypernetwork named 'anime'

Does not implement gradio frontends.

Complex structures might improve hypernetwork performance drastically for large datasets, despite hypernetwork cannot train inductive bias to original models.

Multi layer structure as defined at (HypernetworkModule(size, multipliers=[1, 2, 1]), HypernetworkModule(size, multipliers=[1, 2, 1])) Its 1x -> 2x -> 1x parameter Fully Connected (Linear, or 'dense') layer connection. It can be changed with [1, 2, 4, 2, 1] (or getitem supported objects) for more layers.

Dropout is not supported : Its suggested to implement torch.load instead for more complex networks.

LR scheduler from CosineAnnealingWarmRestarts and Exponential scheduling

ExponentialLR(optimizer, gamma = 0.01 ** (1 / steps)) at 100k epoch, lr will be reduced to 0.01x. CosineAnnealingWarmRestarts will try to increase and decrease learning rate, sometimes its useful to avoid local minima.

It should be totally better to use torch.load for supporting complex hypernetworks and dropouts...

Oct 16 '22 15:10 aria1th