latent-diffusion
latent-diffusion copied to clipboard
Something error with the training on imagenet
It seems that something error with the training code. When I run the code to train the model on ImageNet with cin-ldm-vq-f8.yaml, there is the error occured.
Traceback (most recent call last):
File "main.py", line 722, in
I have never modified the code and run the code with default setting. Could you help me to solve this problem?
hi, Have you solved this problem, can you tell me the specific operation, thank you very much!
hi, Have you solved this problem, can you tell me the specific operation, thank you very much!
I have no idea about this. I just simply skip this function by modifying the code of pytroch_lightning so that the model can be trained with more than one epoch. It seems that skipping this function has no influence on training results (I am not sure cause I only train it with 5 epcoh).
Hello, I wonder if it is convenient to add WeChat to exchange this pytorch-lightning code .
------------------ Original message ------------------ From: "GuoxingYang"; Sendtime: Friday, Jun 10, 2022 12:15 PM To: "CompVis/latent-diffusion"; Cc: @.***>; "Comment"; Subject: Re: [CompVis/latent-diffusion] Something error with the training on imagenet (Issue #85)
hi, Have you solved this problem, can you tell me the specific operation, thank you very much!
I have no idea about this. I just simply skip this function by modifying the code of pytroch_lightning so that the model can be trained with more than one epoch. It seems that skipping this function has no influence on training results (I am not sure cause I only train it with 5 epcoh).
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
Hello, I wonder if it is convenient to add WeChat to exchange this pytorch-lightning code . … ------------------ Original message ------------------ From: "GuoxingYang"; Sendtime: Friday, Jun 10, 2022 12:15 PM To: "CompVis/latent-diffusion"; Cc: @.>; "Comment"; Subject: Re: [CompVis/latent-diffusion] Something error with the training on imagenet (Issue #85) hi, Have you solved this problem, can you tell me the specific operation, thank you very much! I have no idea about this. I just simply skip this function by modifying the code of pytroch_lightning so that the model can be trained with more than one epoch. It seems that skipping this function has no influence on training results (I am not sure cause I only train it with 5 epcoh). — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>
Just see the code in line 1634 of lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py, which can be found in the root of your anaconda enviroment. It is quite easy to add like if hook_name != 'on_training_epoch_end' : fn(xxxxx)
Hello, I wonder if it is convenient to add WeChat to exchange this pytorch-lightning code . … ------------------ Original message ------------------ From: "GuoxingYang"; Sendtime: Friday, Jun 10, 2022 12:15 PM To: "CompVis/latent-diffusion"; Cc: @.>; "Comment"; Subject: Re: [CompVis/latent-diffusion] Something error with the training on imagenet (Issue #85) hi, Have you solved this problem, can you tell me the specific operation, thank you very much! I have no idea about this. I just simply skip this function by modifying the code of pytroch_lightning so that the model can be trained with more than one epoch. It seems that skipping this function has no influence on training results (I am not sure cause I only train it with 5 epcoh). — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: _@**.**_>
Just see the code in line 1634 of lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py, which can be found in the root of your anaconda enviroment. It is quite easy to add like if hook_name != 'on_training_epoch_end' : fn(xxxxx)
ok. thanks a lot.
I ran into the same issue as well, however, I modified the code and then it worked. In the main,py
file, there is a class CUDACallback
which has the method on_train_epoch_end
which is giving errors. there is the outputs parameter passed as argument, however, it is not used in the method. so i deleted it from the arguments. It worked and my models are training