PAML icon indicating copy to clipboard operation
PAML copied to clipboard

Error about meta_optimizer and new data

Open leyi-123 opened this issue 4 years ago • 3 comments

Hello! I use my own data to train your model. After line 170 meta_optimizer.step() is executed, line 150 val_loss, v_ppl = do_learning_fix_step(meta_net, train_iter, val_iter, iterations=config.meta_iteration) outputs val_loss as tensor(nan, device='cuda:0', grad_fn=<AddBackward>), which causes the training failure. I didn't change your code but persona_map, and I want to know what went wrong. Thanks!

leyi-123 avatar Dec 20 '20 06:12 leyi-123

Maybe you need to provide a little bit more details.

This could happen for many reasons:

  • the train_iter or val_iter is empty
  • the lr is too high or more.

andreamad8 avatar Dec 20 '20 11:12 andreamad8

Maybe you need to provide a little bit more details.

This could happen for many reasons:

* the train_iter or val_iter is empty

* the lr is too high
  or more.

I'm sorry that my question may be too sketchy. Now together with my last question, I will tell you in detail! I have trained on your model with my own data, which is in the same format as the example you gave, as follows: 0c8950903140cefc161f24644d5a307

The persona part is not a description, but a person's ID. I changed the cluster_persona function in data_reader.py as follows: 336275b2406bbfb77a20d29fef0a15f

and persona_map.txt is just like: 0db2db71de763ca935f4e738dc75376

When I ran MAML.py, I found that I couldn't train the model I wanted. After observation, I found that the functions, do_evaluation (defined on line 96 in MAML.py) and do_learning_fix_step (defined on line 74 in MAML.py), both returned tensor(nan, device='cuda:0', grad_fn=<AddBackward>) after the meta_optimizer.step() (line 170), which led to the training failure. So I want to know how to solve it. Thank you very much!

leyi-123 avatar Dec 20 '20 13:12 leyi-123

mmm I see. I really don't know at this point.

I suggest going step by step inside the do_evaluation function to check where the loss gets none.

Sorry I cannot help much here.

andreamad8 avatar Dec 20 '20 23:12 andreamad8