cnn-dailymail Making Dataset for Bengali Language

Making Dataset for Bengali Language

Open PrithwirajRizu opened this issue 5 years ago • 2 comments

I have two files in Bengali. article.txt, and summary.txt. Now how can I convert it to corresponding train.bin, val.bin, test.bin? I just couldn't understand how to process my Bengali corpus for this summarization process. Thanks in advance.

Jul 16 '19 06:07 PrithwirajRizu

Hi @PrithwirajRizu Your story should be like this.

article = open('article.txt', 'r').read()
summary = open('summary.txt', 'r').read()

story = article + '\n\n' + '@highlight'+'\n'+summary

Then follow this to generate train or test data.

Sep 10 '19 20:09 sagorbrur

Hi @PrithwirajRizu Your story should be like this.
article = open('article.txt', 'r').read()
summary = open('summary.txt', 'r').read()

story = article + '\n\n' + '@highlight'+'\n'+summary 
Then follow this to generate train or test data.

I guess each sentence of the summary should be in a separate line and separated by the "@highlight" tag

Jun 16 '20 04:06 senjed

cnn-dailymail cnn-dailymail copied to clipboard

Making Dataset for Bengali Language

cnn-dailymail
cnn-dailymail copied to clipboard