fast_abs_rl icon indicating copy to clipboard operation
fast_abs_rl copied to clipboard

scatter_add

Open ghost opened this issue 4 years ago • 10 comments

In scatter_add function "source" parameter is not working , "src" should be used instead of "source".

ghost avatar May 01 '20 18:05 ghost

It depends on what version of PyTorch you are using. According to the Readme, if you use 0.4.0, the scatter_add method works fine.

But unfortunately, 0.4.0 is not supported on the new GPUs and thus this change is needed.

I personally use GCP and the code was working fine on a K80 GPU. Recently, I shifted to a T4 GPU because of resource availability issues and encountered the same error as you when I shifted to a more recent version of PyTorch.

kailashkarthik9 avatar May 11 '20 04:05 kailashkarthik9

Hi @kailashkarthik9 , Did you try decoding summaries? I have trained my own model but I am facing difficulties while decoding the summaries. I am running below command for evaluating full model. python eval_full_model.py --meteor --decode_dir='/home/ajay/Desktop/new/cnn-dailymail/finished_files/decoded_files/test'

Error: Invalid or corrupt jarfile /home/ajay/Desktop/meteor Traceback (most recent call last):   File "eval_full_model.py", line 53, in     main(args)   File "eval_full_model.py", line 31, in main     output = eval_meteor(dec_pattern, dec_dir, ref_pattern, ref_dir)   File "/home/ajay/Desktop/new/fast_abs_rl/evaluate.py", line 70, in eval_meteor     output = sp.check_output(cmd.split(' '), universal_newlines=True)   File "/home/ajay/anaconda3/envs/torch1/lib/python3.5/subprocess.py", line 316, in check_output     **kwargs).stdout   File "/home/ajay/anaconda3/envs/torch1/lib/python3.5/subprocess.py", line 398, in run     output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['java', '-Xmx2G', '-jar', '/home/ajay/Desktop/meteor', '/tmp/tmpc_n8dcga/dec.txt', '/tmp/tmpc_n8dcga/ref.txt', '-l', 'en', '-norm']' returned non-zero exit status 1.

Don't know what might have gone wrong, any help would be appreciated. Although it's showing invalid or Corrupt Jarfile, I have tried with different jarfiles but still its showing same.

segsev avatar May 15 '20 14:05 segsev

@segsev do you gave the path to meteor-1.5.jar in environment variable METEOR from ur error it looks like u gave the whole folder as path, try giving path to only the jar file once, named "meteor-1.5.jar".

ghost avatar May 15 '20 14:05 ghost

Hi @know-one-1 Thanks for A2A. I tried providing the path to meteor-1.5.jar but still getting the same error. python eval_full_model.py --meteor --decode_dir='/home/ajay/Desktop/new/cnn-dailymail/finished_files/decoded_files/val'

Error: Invalid or corrupt jarfile /home/ajay/Desktop/meteor/meteor-1.5.jar Traceback (most recent call last):   File "eval_full_model.py", line 53, in     main(args)   File "eval_full_model.py", line 31, in main     output = eval_meteor(dec_pattern, dec_dir, ref_pattern, ref_dir)   File "/home/ajay/Desktop/new/fast_abs_rl/evaluate.py", line 70, in eval_meteor     output = sp.check_output(cmd.split(' '), universal_newlines=True)   File "/home/ajay/anaconda3/envs/torch1/lib/python3.5/subprocess.py", line 316, in check_output     **kwargs).stdout   File "/home/ajay/anaconda3/envs/torch1/lib/python3.5/subprocess.py", line 398, in run     output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['java', '-Xmx2G', '-jar', '/home/ajay/Desktop/meteor/meteor-1.5.jar', '/tmp/tmpxrfhv9_9/dec.txt', '/tmp/tmpxrfhv9_9/ref.txt', '-l', 'en', '-norm']' returned non-zero exit status 1.

Even running meteor without any argument is giving Invalid or corrupt jarfile while ideally it should print the help message. I guess the problem is with the jar file.
Do you have some link for the correct jar file? Have you tried evaluating the decode summary?

segsev avatar May 15 '20 16:05 segsev

@segsev http://www.cs.cmu.edu/~alavie/METEOR/ this is the one i used and i think u also is using the same link, yes i ran my model and evaluated this, i used python 3.7 but i think that doesn't matters, but similar (issue non-zero exit status bcoz of command not getting executed) came while i was evaluating using pyrouge and turned out that pyrouge setup was not correct,

ghost avatar May 15 '20 17:05 ghost

@know-one-1 yeah in my case the problem was with jdk version I guess, I was able to run on my mac while the jar was breaking on my linux system. I updated the jdk and it worked. However pyrouge is also throwing some error. How was your pyrouge setup ?

I am using this pyrouge repo https://github.com/andersjo/pyrouge and after running for some 5 minutes, its throwing this error. No such file or directory: '/Desktop/pyrouge/tools/ROUGE-1.5.5/ROUGE-1.5.5.pl/ROUGE-1.5.5.pl'

my ROUGE setup in bashrc is like that:

export ROUGE=/Desktop/pyrouge/tools/ROUGE-1.5.5/ROUGE-1.5.5.pl

Did you use the same pyrouge repo?

segsev avatar May 15 '20 19:05 segsev

no i tried with that but keep getting the error , u can follow https://stackoverflow.com/questions/45894212/installing-pyrouge-gets-error-in-ubuntu this link to setup pyrouge then set the path as required, it worked for me

ghost avatar May 15 '20 19:05 ghost

My export statement is

export ROUGE="/home/ks3740/pyrouge/tools/ROUGE-1.5.5/"

If you see your error log it is searching for '/Desktop/pyrouge/tools/ROUGE-1.5.5**/ROUGE-1.5.5.pl/**ROUGE-1.5.5.pl'

If you fix the export path to '/Desktop/pyrouge/tools/ROUGE-1.5.5/' it should work hopefully!

kailashkarthik9 avatar May 16 '20 16:05 kailashkarthik9

Managed to fix it, Thanks @kailashkarthik9 @know-one-1

segsev avatar May 17 '20 17:05 segsev

It depends on what version of PyTorch you are using. According to the Readme, if you use 0.4.0, the scatter_add method works fine.

But unfortunately, 0.4.0 is not supported on the new GPUs and thus this change is needed.

I personally use GCP and the code was working fine on a K80 GPU. Recently, I shifted to a T4 GPU because of resource availability issues and encountered the same error as you when I shifted to a more recent version of PyTorch.

Find copy_summ.py in ./models/. Change source in "source=score.contiguous().view(beam*batch, -1) * copy_prob" and "source=score * copy_prob" to src will work for PyTorch 1.5.0.

yueguo-50 avatar Jun 09 '21 20:06 yueguo-50