XML-CNN icon indicating copy to clipboard operation
XML-CNN copied to clipboard

The link of RCV1 dataset is invalid

Open AppleXY opened this issue 4 years ago • 2 comments

Hi, when I got into the link of the RCV dataset, I found "404 not found", could you provide another link of the RCV dataset? If possible could you provide other datasets in your paper. It's a little hard for me to understand the code without the dataset. Thank you very much!

AppleXY avatar Mar 10 '20 13:03 AppleXY

You can know the format of the data by looking at the load_data method.

In the line, you see the data is pickle files containing four attributes (the last two are never used and can thus ignore).

[train, test, vocab, catgy] = pickle.load(fin)

Then looking at the load_data_and_labels method, you see the train/test data are a list of document dicts with key 'text' for the plain text document and 'catgy' for the label.

There's another closed issue providing a link to some other datasets used in the paper.

YipingNUS avatar Aug 19 '20 03:08 YipingNUS

Please provide .p file for eurlex, wiki10, amazonCat datasets

purviprajapati196 avatar May 08 '21 05:05 purviprajapati196