guided_summarization
guided_summarization copied to clipboard
Training stuck at step 72900/200000
Hi,
I noticed that the training on CNN dataset gets stuck at step 72900/200000. However the GPU utilization shows 100%. I tried training 3 times. But every time I am getting stuck at the same step. I tried different datasets and the training gets stuck at the same step.(with GPU utilization at 100%). Have attached the image here for reference. Need your inputs regarding this.
Thanks
Hi, thanks for opening the issue! This is a problem with the PreSumm code (https://github.com/nlpyang/PreSumm/issues/135). One workaround is to reload checkpoint-72000.pt and resume training.
Thanks @zdou0830
hi @bhuvanakundumani ,can i know how are you giving data-path,I tried different varites,but every time it is taking only bert_output/cnndm,train.0.pt. In between bert_output is my output directory
Hi, i follow the step, but my acc is too small,
Can i know how do you run it @bhuvanakundumani
hi @gaozhiguang, you should probably check your input data. I tried it on biomedical data and it worked fine. thanks
Thanks @bhuvanakundumani
hi @bhuvanakundumani and @zdou0830 i'm sure you've moved on by now. However, I was wondering if you remember how you got the model to run. I am having issues getting stuck on the first training example. My question is how did you get the model to continue passed the cnndm.train.0.bert.pt and move on to the next files in the data_path directory. I'm currently getting an EOFError: Ran out of input error.