sgan icon indicating copy to clipboard operation
sgan copied to clipboard

Training loss does not change, and validation FDE error is super high

Open cuihenggang opened this issue 5 years ago • 6 comments

I am trying to train Social-GAN with the code in the repository, but it looks like the G loss and D loss never change after 1.386 and 0.693.

Also, the validation FDE error is 11.058

Am I doing the training correctly?

$ PYTHONPATH=. python scripts/train.py --restore_from_checkpoint 0
[INFO: train.py:  118]: Initializing train dataset
[INFO: train.py:  120]: Train dataset size: 2692
[INFO: train.py:  121]: Initializing val dataset
[INFO: train.py:  129]: There are 21 iterations per epoch
[INFO: train.py:  153]: Here is the generator:
[INFO: train.py:  154]: TrajectoryGenerator(
  (encoder): Encoder(
    (encoder): LSTM(64, 64)
    (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
  )
  (decoder): Decoder(
    (decoder): LSTM(64, 128)
    (pool_net): PoolHiddenNet(
      (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
      (mlp_pre_pool): Sequential(
        (0): Linear(in_features=192, out_features=512, bias=True)
        (1): ReLU()
        (2): Linear(in_features=512, out_features=1024, bias=True)
        (3): ReLU()
      )
    )
    (mlp): Sequential(
      (0): Linear(in_features=1152, out_features=1024, bias=True)
      (1): ReLU()
      (2): Linear(in_features=1024, out_features=128, bias=True)
      (3): ReLU()
    )
    (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
    (hidden2pos): Linear(in_features=128, out_features=2, bias=True)
  )
  (pool_net): PoolHiddenNet(
    (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
    (mlp_pre_pool): Sequential(
      (0): Linear(in_features=128, out_features=512, bias=True)
      (1): ReLU()
      (2): Linear(in_features=512, out_features=1024, bias=True)
      (3): ReLU()
    )
  )
  (mlp_decoder_context): Sequential(
    (0): Linear(in_features=1088, out_features=1024, bias=True)
    (1): ReLU()
    (2): Linear(in_features=1024, out_features=128, bias=True)
    (3): ReLU()
  )
)
[INFO: train.py:  169]: Here is the discriminator:
[INFO: train.py:  170]: TrajectoryDiscriminator(
  (encoder): Encoder(
    (encoder): LSTM(64, 64)
    (spatial_embedding): Linear(in_features=2, out_features=64, bias=True)
  )
  (real_classifier): Sequential(
    (0): Linear(in_features=64, out_features=1024, bias=True)
    (1): ReLU()
    (2): Linear(in_features=1024, out_features=1, bias=True)
    (3): ReLU()
  )
)
[INFO: train.py:  233]: Starting epoch 1
[INFO: train.py:  278]: t = 1 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.326
[INFO: train.py:  280]:   [D] D_total_loss: 1.326
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 6 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.098
[INFO: train.py:  280]:   [D] D_total_loss: 1.098
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 11 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 0.994
[INFO: train.py:  280]:   [D] D_total_loss: 0.994
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.687
[INFO: train.py:  283]:   [G] G_total_loss: 0.687
[INFO: train.py:  233]: Starting epoch 2
[INFO: train.py:  278]: t = 16 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.097
[INFO: train.py:  280]:   [D] D_total_loss: 1.097
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.462
[INFO: train.py:  283]:   [G] G_total_loss: 0.462
[INFO: train.py:  278]: t = 21 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.669
[INFO: train.py:  280]:   [D] D_total_loss: 1.669
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.534
[INFO: train.py:  283]:   [G] G_total_loss: 0.534
[INFO: train.py:  278]: t = 26 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 3
[INFO: train.py:  278]: t = 31 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 36 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 41 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 4
[INFO: train.py:  278]: t = 46 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 51 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 56 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 5
[INFO: train.py:  278]: t = 61 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 66 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 6
[INFO: train.py:  278]: t = 71 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 76 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 81 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 7
[INFO: train.py:  278]: t = 86 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 91 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 96 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 8
[INFO: train.py:  278]: t = 101 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  294]: Checking stats on val ...
[INFO: train.py:  298]: Checking stats on train ...
[INFO: train.py:  305]:   [val] ade: 7.662
[INFO: train.py:  305]:   [val] ade_l: 16.871
[INFO: train.py:  305]:   [val] ade_nl: 14.038
[INFO: train.py:  305]:   [val] d_loss: 1.386
[INFO: train.py:  305]:   [val] fde: 11.058
[INFO: train.py:  305]:   [val] fde_l: 24.348
[INFO: train.py:  305]:   [val] fde_nl: 20.260
[INFO: train.py:  305]:   [val] g_l2_loss_abs: 21.739
[INFO: train.py:  305]:   [val] g_l2_loss_rel: 21.739
[INFO: train.py:  308]:   [train] ade: 7.913
[INFO: train.py:  308]:   [train] ade_l: 16.727
[INFO: train.py:  308]:   [train] ade_nl: 15.018
[INFO: train.py:  308]:   [train] d_loss: 1.386
[INFO: train.py:  308]:   [train] fde: 11.870
[INFO: train.py:  308]:   [train] fde_l: 25.090
[INFO: train.py:  308]:   [train] fde_nl: 22.527
[INFO: train.py:  308]:   [train] g_l2_loss_abs: 22.713
[INFO: train.py:  308]:   [train] g_l2_loss_rel: 22.713
[INFO: train.py:  315]: New low for avg_disp_error
[INFO: train.py:  321]: New low for avg_disp_error_nl
[INFO: train.py:  335]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_with_model.pt
[INFO: train.py:  337]: Done.
[INFO: train.py:  343]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_no_model.pt
[INFO: train.py:  354]: Done.
[INFO: train.py:  278]: t = 106 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 111 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 9
[INFO: train.py:  278]: t = 116 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 121 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 126 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 10
[INFO: train.py:  278]: t = 131 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 136 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 11
[INFO: train.py:  278]: t = 141 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 146 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 151 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 12
[INFO: train.py:  278]: t = 156 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 161 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 166 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 13
[INFO: train.py:  278]: t = 171 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 176 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 181 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 14
[INFO: train.py:  278]: t = 186 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 191 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 196 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 15
[INFO: train.py:  278]: t = 201 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  294]: Checking stats on val ...
[INFO: train.py:  298]: Checking stats on train ...
[INFO: train.py:  305]:   [val] ade: 7.662
[INFO: train.py:  305]:   [val] ade_l: 16.870
[INFO: train.py:  305]:   [val] ade_nl: 14.037
[INFO: train.py:  305]:   [val] d_loss: 1.386
[INFO: train.py:  305]:   [val] fde: 11.058
[INFO: train.py:  305]:   [val] fde_l: 24.348
[INFO: train.py:  305]:   [val] fde_nl: 20.260
[INFO: train.py:  305]:   [val] g_l2_loss_abs: 21.739
[INFO: train.py:  305]:   [val] g_l2_loss_rel: 21.739
[INFO: train.py:  308]:   [train] ade: 7.910
[INFO: train.py:  308]:   [train] ade_l: 16.640
[INFO: train.py:  308]:   [train] ade_nl: 15.079
[INFO: train.py:  308]:   [train] d_loss: 1.386
[INFO: train.py:  308]:   [train] fde: 11.827
[INFO: train.py:  308]:   [train] fde_l: 24.878
[INFO: train.py:  308]:   [train] fde_nl: 22.545
[INFO: train.py:  308]:   [train] g_l2_loss_abs: 22.704
[INFO: train.py:  308]:   [train] g_l2_loss_rel: 22.704
[INFO: train.py:  315]: New low for avg_disp_error
[INFO: train.py:  321]: New low for avg_disp_error_nl
[INFO: train.py:  335]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_with_model.pt
[INFO: train.py:  337]: Done.
[INFO: train.py:  343]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_no_model.pt
[INFO: train.py:  354]: Done.
[INFO: train.py:  278]: t = 206 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 16
[INFO: train.py:  278]: t = 211 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 216 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 221 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 17
[INFO: train.py:  278]: t = 226 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 231 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 236 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 18
[INFO: train.py:  278]: t = 241 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 246 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 251 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 19
[INFO: train.py:  278]: t = 256 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 261 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 266 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 20
[INFO: train.py:  278]: t = 271 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 276 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 21
[INFO: train.py:  278]: t = 281 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 286 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 291 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  233]: Starting epoch 22
[INFO: train.py:  278]: t = 296 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  278]: t = 301 / 4200
[INFO: train.py:  280]:   [D] D_data_loss: 1.386
[INFO: train.py:  280]:   [D] D_total_loss: 1.386
[INFO: train.py:  283]:   [G] G_discriminator_loss: 0.693
[INFO: train.py:  283]:   [G] G_total_loss: 0.693
[INFO: train.py:  294]: Checking stats on val ...
[INFO: train.py:  298]: Checking stats on train ...
[INFO: train.py:  305]:   [val] ade: 7.662
[INFO: train.py:  305]:   [val] ade_l: 16.870
[INFO: train.py:  305]:   [val] ade_nl: 14.037
[INFO: train.py:  305]:   [val] d_loss: 1.386
[INFO: train.py:  305]:   [val] fde: 11.058
[INFO: train.py:  305]:   [val] fde_l: 24.348
[INFO: train.py:  305]:   [val] fde_nl: 20.260
[INFO: train.py:  305]:   [val] g_l2_loss_abs: 21.739
[INFO: train.py:  305]:   [val] g_l2_loss_rel: 21.739
[INFO: train.py:  308]:   [train] ade: 7.797
[INFO: train.py:  308]:   [train] ade_l: 16.335
[INFO: train.py:  308]:   [train] ade_nl: 14.918
[INFO: train.py:  308]:   [train] d_loss: 1.386
[INFO: train.py:  308]:   [train] fde: 11.681
[INFO: train.py:  308]:   [train] fde_l: 24.471
[INFO: train.py:  308]:   [train] fde_nl: 22.348
[INFO: train.py:  308]:   [train] g_l2_loss_abs: 22.124
[INFO: train.py:  308]:   [train] g_l2_loss_rel: 22.124
[INFO: train.py:  315]: New low for avg_disp_error
[INFO: train.py:  321]: New low for avg_disp_error_nl
[INFO: train.py:  335]: Saving checkpoint to /home/hcui2/clones/sgan/checkpoint_with_model.pt
[INFO: train.py:  337]: Done.

cuihenggang avatar Sep 10 '19 19:09 cuihenggang

Hi, I have the same problem. Did you end up fixing this issue? Did training until t = 4200 helps?

ntruongv avatar Oct 31 '19 19:10 ntruongv

No luck :(

cuihenggang avatar Oct 31 '19 23:10 cuihenggang

activation using relu or leakyrelu

ZhoubinXM avatar Nov 12 '19 14:11 ZhoubinXM

did you try to use a larger learning rate like e.g. 1e-3? try to reuse hyperparameters from run_traj.sh maybe it will fix your problem

munila avatar Nov 27 '19 15:11 munila

I have finally figured out the issue. You need to train with the run_traj.sh script. The default arguments in train.py don't work. The most important argument is --l2_loss_weight 1 which adds the L2 loss to the generator. Social-GAN needs the L2 loss to train and doesn't work with GAN loss only.

cuihenggang avatar Dec 13 '19 20:12 cuihenggang

I have finally figured out the issue. You need to train with the run_traj.sh script. The default arguments in train.py don't work. The most important argument is --l2_loss_weight 1 which adds the L2 loss to the generator. Social-GAN needs the L2 loss to train and doesn't work with GAN loss only.

Hi, I have the same problem. I set --l2_loss_weight to 1. But the G loss and D loss keep unchanged still(1.386 and 0.693 respectively) and l2 loss keeps changing. Do you know how to fix it?

Viozer avatar Mar 21 '20 08:03 Viozer