WikiHow-Dataset icon indicating copy to clipboard operation
WikiHow-Dataset copied to clipboard

How do you test the Seq-to-Seq with attention model on WikiHow?

Open jiahuanluo opened this issue 5 years ago • 2 comments

Appreciate for your job. I have downloaded the WikiHow dataset, but I can't find a headline corresponding to a paragraph, which is a pair of headline and paragraph used for sentence summarization task. I am interested in the format of data that you test on Seq-to-Seq with attention model. Thanks.

jiahuanluo avatar Oct 25 '18 07:10 jiahuanluo

The uploaded file is the concatenation of paragraphs. I will soon add a file in which you can have access to separate paragraphs and their summaries. The same data can be used to run a seq-to-seq model, however, the exact train/validation/test sets used for experiments might be uploaded as well. You need to create pairs of article and summary, tokenize them and feed them to the model.

mahnazkoupaee avatar Oct 26 '18 03:10 mahnazkoupaee

I tried. articles way too long, and my GPUs are very in limited memory using word2vec by glove. eVEN 12GB of VRAM is not enough. Cutoff articles may produce underfit training. Right now I filter based on selected POS and NER, still waiting for the results and testing after that.

huseinzol05 avatar Feb 07 '19 02:02 huseinzol05