progressive-growing-torch
progressive-growing-torch copied to clipboard
fade-in layer test
------------
alpha:0
1-alpha:1
[1] grad sum:0.66109210252762
[2] alpha + 1-alpha:0.66109210252762
------------
alpha:0
1-alpha:1
[1] grad sum:-0.11638873815536
[2] alpha + 1-alpha:-0.11638873815536
------------
alpha:0
1-alpha:1
[1] grad sum:0.82711493968964
[2] alpha + 1-alpha:0.82711493968964
------------
alpha:0.00536
1-alpha:0.99464
[1] grad sum:-0.0023111030459404
[2] alpha + 1-alpha:-0.0023110407637432
[E:0][T:8][ 8288/202599] errD(real): 0.0312 | errD(fake): 0.0094 | errG: 0.8557 [Res: 8][Trn(G):0.5%][Trn(D):0.0%][Elp(hr):0.0261]
------------
alpha:0
1-alpha:1
[1] grad sum:0.5822246670723
[2] alpha + 1-alpha:0.5822246670723
------------
alpha:0
1-alpha:1
[1] grad sum:-1.6143760681152
[2] alpha + 1-alpha:-1.6143760681152
------------
alpha:0
1-alpha:1
[1] grad sum:0.58907866477966
[2] alpha + 1-alpha:0.58907866477966
------------
alpha:0.0054
1-alpha:0.9946
[1] grad sum:0.27052643895149
[2] alpha + 1-alpha:0.27052646130323
[E:0][T:8][ 8320/202599] errD(real): 0.0391 | errD(fake): 0.5431 | errG: 0.0795 [Res: 8][Trn(G):0.5%][Trn(D):0.0%][Elp(hr):0.0263]
------------
alpha:0
1-alpha:1
[1] grad sum:1.7794182300568
[2] alpha + 1-alpha:1.7794182300568
------------
alpha:0
1-alpha:1
[1] grad sum:-0.21309423446655
[2] alpha + 1-alpha:-0.21309423446655
------------
alpha:0
1-alpha:1
[1] grad sum:1.1774388551712
[2] alpha + 1-alpha:1.1774388551712
------------
alpha:0.00544
1-alpha:0.99456
[1] grad sum:1.3399093151093
[2] alpha + 1-alpha:1.339909250848
[E:0][T:8][ 8352/202599] errD(real): 0.3095 | errD(fake): 0.0158 | errG: 0.8003 [Res: 8][Trn(G):0.5%][Trn(D):0.0%][Elp(hr):0.0264]
------------
alpha:0
1-alpha:1
[1] grad sum:0.37424358725548
[2] alpha + 1-alpha:0.37424358725548
------------
I have tested 3 different ways:
(1) fade-in G, and freeze G, fade-in D, freeze D
(2) fade-in G, fade-in D, freeze G and D at the same time when the network grows
(3) fade-in G, fade-in D, then grow network without freezing pre-trained weights.
It seems (2), and(3) does not work well and the loss diverges.
Also tested:
(1) trans G --> stab G --> trans D --> stab D
(2) trans G --> trans D --> stab D (G is freezed after its transition)
(3) trans G, D at the same time --> stab G, D
(4) trans G --> trans D --> trans D,G (G is not freezed after its transition)
It seems (2) is the only working case.