Atul Kumar comments

Results 21 comments of


                                            Atul Kumar

strange view() operation in ReduceState module

Thanks for reviewing the code. I have fixed the bug. https://github.com/atulkum/pointer_summarizer/blob/master/training_ptr_gen/train.py#L91 https://github.com/atulkum/pointer_summarizer/blob/master/training_ptr_gen/train.py#L100

strange view() operation in ReduceState module

I have turned on is_coverage=True after training for 500k iteration. Making is_coverage=True from the beginning makes the training unstable.

strange view() operation in ReduceState module

You are right about increasing branches computation graph but it won't cause NaN. If you are getting NaN then it might be somewhere else. I tested it on pytorch 0.4...

strange view() operation in ReduceState module

After how many iteration (with is_coverage = True) you are getting NaN? Did you initialize the model_file_path in the code? https://github.com/atulkum/pointer_summarizer/blob/master/training_ptr_gen/train.py#L141 You can try to debug it on CPU. My...

strange view() operation in ReduceState module

I have uploaded a model [here](https://drive.google.com/open?id=1luUphx8Glc7uSPhKZuvvF8PiH0XR6EdC). I retrain it with with is_coverage = True for 200k iteration but did not get NaN For retraining you should do 3 things: 1)...

Error: -bash: log/training_log: Is a directory

```>& log/training_log``` simply redirect the output to the file 'log/training_log' ```&``` at the end is for running the program in background. You might have ```training_log ``` directory created in ```log```...

RuntimeError: CUDA error: out of memory

2GB is too low. you can do 2 things: 1) use pre-trained embedding, extract embedding vector on cpu and don't load embedding into gpu. 2) use less number of encoding...

Motivation for x_context

Yes in the paper it is not mentioned anywhere but the code has it. https://github.com/abisee/pointer-generator/blob/master/attention_decoder.py#L150

Motivation for x_context

I get the paper where similar kind of attention mechanism is used. [Order Matters: Sequence to sequence for sets](https://arxiv.org/abs/1511.06391)

Motivation for x_context

Thanks for pointing this out. You are right. I have updated my code, I still need to re-run the experiments though. I will update the result after that. Here is...