IMoS icon indicating copy to clipboard operation
IMoS copied to clipboard

The arm model result cannot be replicated through train_arm.py

Open wzyabcas opened this issue 2 years ago • 6 comments

Hi, I ran data_preprocess.py and train_arms.py with default configs. The validation loss during the teacher forcing phase was around 0.006 and did not continue to decrease. After removing teacher forcing, it stayed around 0.01 without further improvement. This validation loss curve is quite different from the training curve you trained. The visualization dosen't look good as well. I want to ask whether you used the script "train_arms.py" for training and whether there were any special hyperparameters used. I am also curious why there is a joint mismatch between the pretrained model and the model we trained ourselves. Thx for answering!!

wzyabcas avatar Jul 01 '23 20:07 wzyabcas

I also have the same problem. In contrast to the official pre-trained model, the model can not be re-trained well by us. We hope the author adds more training details to the README file.

@anindita127 Looking forward to your reply!

Dean-UQ avatar Jul 06 '23 03:07 Dean-UQ

Thank you for your interest in our work. We have added the training instructions to our README file. Please note that our model is experimental, and recreating the exact trained model may depend on the system configurations listed in the README. To expedite reproducing and reusing our model and results, we highly recommend using the provided pre-trained weights.

Note that if you are not using the pre-trained weights. you need to comment out lines https://github.com/anindita127/IMoS/blob/e1724bef51640db164e77723caf4dda19d95dafa/src/test/test_synthesis.py#L382C37-L382C82

anindita127 avatar Jul 06 '23 04:07 anindita127

Thanks for replying and adding the training instructions in the README file! I ran the train_arms.py the way you mentioned in README file, but there exists a really huge performance gap in the validation set, but the loss curve is similar with your pretrained model's in train set. There is a random choice of z_past and z_present in the ARM CVAE model, which might cause performance gap. But we trained a lot of times, the results are quite similarly not well. Can you provide more details of training? Thanks so much!

Besides, I am curious why there is a joint mismatch between the model trained through train_arms.py and the model you pretrained. I wonder whether you train the model through another script? If so, can you provide the training script you used, even if it was experimental? Thanks.

@anindita127 Looking forward to your reply! Thanks!

wzyabcas avatar Jul 06 '23 10:07 wzyabcas

@anindita127 @wzyabcas Hi, has this issue been fixed?

xbq1994 avatar Oct 18 '23 09:10 xbq1994

@anindita127 @wzyabcas Hi, has this issue been fixed?

I didn't find out the problem after reading the codes. So I didn't dig into the project to fix the problem.

wzyabc2 avatar Oct 25 '23 01:10 wzyabc2

@anindita127 This issue seems quite similar to my other query, is there a different version of training code for achieving your pretrained model result?

Thanks.

z050209 avatar Jan 25 '24 09:01 z050209

Hi @anindita127,

Similar to the above observations from other users, I'm wasn't able to reproduce the arm-synthesis results, in-fact the performance difference between the provided pre-trained model and what I can reproduce using the training script is big.

I used the train-script and training setting provided in the repo and processed the data, like its mentioned in the repo. I sanity checked the setups by reproducing the results using the pre-trained model provided, where it works.

Here is the loss curves from training. image

Here are the results:

https://github.com/user-attachments/assets/4227b6fc-b57e-43bb-bfac-4d3b34016bc9

https://github.com/user-attachments/assets/9a2466ed-fcc1-48ba-8728-956d52d3816a

https://github.com/user-attachments/assets/90579ae4-fbde-4439-9cda-9c47d65f843e

Please provide any tips that, I can be used to debug and reproduce the results. Otherwise, Can you train a new model using the code provided in the repo and share the weights and results ? This could be helpful to debug, imo.

Looking forward to your response.

Thanks.

James-1994 avatar Oct 04 '24 19:10 James-1994

Hi, thanks for letting me know. We are not actively maintaining this repository right now so my suggestion is to use the pre-tained model weights that are provided. Or use them as Initial weights for your training. We will get back to you if someone starts working on this repo again! Closing this issue for now.

anindita127 avatar Oct 05 '24 05:10 anindita127