XML-CNN
XML-CNN copied to clipboard
The link of RCV1 dataset is invalid
Hi, when I got into the link of the RCV dataset, I found "404 not found", could you provide another link of the RCV dataset? If possible could you provide other datasets in your paper. It's a little hard for me to understand the code without the dataset. Thank you very much!
You can know the format of the data by looking at the load_data method.
In the line, you see the data is pickle files containing four attributes (the last two are never used and can thus ignore).
[train, test, vocab, catgy] = pickle.load(fin)
Then looking at the load_data_and_labels method, you see the train/test data are a list of document dicts with key 'text' for the plain text document and 'catgy' for the label.
There's another closed issue providing a link to some other datasets used in the paper.
Please provide .p file for eurlex, wiki10, amazonCat datasets