guided_summarization icon indicating copy to clipboard operation
guided_summarization copied to clipboard

Training stuck at step 72900/200000

Open bhuvanakundumani opened this issue 4 years ago • 7 comments

Hi,

I noticed that the training on CNN dataset gets stuck at step 72900/200000. However the GPU utilization shows 100%. I tried training 3 times. But every time I am getting stuck at the same step. I tried different datasets and the training gets stuck at the same step.(with GPU utilization at 100%). Have attached the image here for reference. Need your inputs regarding this. Thanks GSUM

bhuvanakundumani avatar Feb 01 '21 16:02 bhuvanakundumani

Hi, thanks for opening the issue! This is a problem with the PreSumm code (https://github.com/nlpyang/PreSumm/issues/135). One workaround is to reload checkpoint-72000.pt and resume training.

zdou0830 avatar Feb 02 '21 17:02 zdou0830

Thanks @zdou0830

bhuvanakundumani avatar Feb 05 '21 18:02 bhuvanakundumani

hi @bhuvanakundumani ,can i know how are you giving data-path,I tried different varites,but every time it is taking only bert_output/cnndm,train.0.pt. In between bert_output is my output directory

maheshmylavarapu0057 avatar Mar 26 '21 14:03 maheshmylavarapu0057

Hi, i follow the step, but my acc is too small, image Can i know how do you run it @bhuvanakundumani

gaozhiguang avatar Jul 14 '21 02:07 gaozhiguang

hi @gaozhiguang, you should probably check your input data. I tried it on biomedical data and it worked fine. thanks

bhuvanakundumani avatar Jul 14 '21 05:07 bhuvanakundumani

Thanks @bhuvanakundumani

gaozhiguang avatar Jul 14 '21 05:07 gaozhiguang

hi @bhuvanakundumani and @zdou0830 i'm sure you've moved on by now. However, I was wondering if you remember how you got the model to run. I am having issues getting stuck on the first training example. My question is how did you get the model to continue passed the cnndm.train.0.bert.pt and move on to the next files in the data_path directory. I'm currently getting an EOFError: Ran out of input error.

git-ekeh avatar Sep 22 '22 02:09 git-ekeh