Abi See comments

Results 36 comments of


                                            Abi See

Get NAN loss after 35k steps

@rahul-iisc I've had another look at the code. I see your point about > OOV part of vocab is max_art_oov long. Not all the sequences in a batch will have...

Get NAN loss after 35k steps

I've looked further into this and still don't understand where the NaNs are coming from. I changed the code to detect when a NaN occurs, then dump the attention distribution,...

Hello everyone, and thanks for your patience. We've made a few changes that help with the NaN issue. * We [changed](https://github.com/abisee/pointer-generator/commit/d08c4c5cc358a0e9bdeebb46e47885cd8cdb2760) the way the log of the final distribution is...

Why it outputs ". . . . . . . . . . . . ." when I run decode mode?

I tend to see this kind of output in the earlier phases of training (i.e. when the model is still under-trained). Look at the loss curve on tensorboard -- has...

Why it outputs ". . . . . . . . . . . . ." when I run decode mode?

@makcbe Yes, the `eval` mode is designed to be run concurrently with `train` mode. The idea is you can see the loss on the validation set plotted alongside the loss...

Why it outputs ". . . . . . . . . . . . ." when I run decode mode?

Hi @LilyZL 1. Yes, repetition is very common (it is one of the two big things we are aiming to fix as noted in the [ACL paper](https://arxiv.org/abs/1704.04368)). That's what the...

Why it outputs ". . . . . . . . . . . . ." when I run decode mode?

Hi @fishermanff Yes, running `run_summarization.py` in train mode should restore your last training checkpoint. I think it's handled by the [supervisor](https://github.com/abisee/pointer-generator/blob/master/run_summarization.py#L133).

Multi-GPU support?

We do not plan to add that to this repo, but it should be fairly straightforward to copy that functionality from the TextSum code.

Tried running it on random internet news articles. Results look more extractive than abstractive?

Hi @anubhavmax, the same question has been asked [here](https://github.com/abisee/pointer-generator/issues/21). Yes - the pointer-generator model produces mostly extractive summaries. This is discussed in section 7.2 of the [paper](https://arxiv.org/pdf/1704.04368.pdf). It is the...

Final tensorflow loss for results reported in paper

Yes, RNNs are very slow to train, especially for long sequences (such as in this project), due to the sequential nature of the recurrent connections. I assume by "brute force",...