NFETC
NFETC copied to clipboard
One-hot label percentage error in Wikim dataset
Hi,
After downloading the corpus and preprocessing using transform.py
, we find that 92.9% of wikim test samples have one-hot labels. The statistics are different from those shown in the paper.
Our results:
Statistics in the paper:
We preprocess the wikim dataset following the README.md.
We calculate the statistics by adding the above code snippet after line 77 in task.py
:
label_k = [x[-1].sum() for x in self.full_test_set]
label_one_hot = [x for x in label_k if x == 1]
label_multi_hot = [x for x in label_k if x != 1]
logger.info('{}/{} one hot, {}/{} multi hot.'.format(len(label_one_hot), len(label_k), len(label_multi_hot), len(label_k)))
label_k = [x[-1].sum() for x in self.test_set]
label_one_hot = [x for x in label_k if x == 1]
label_multi_hot = [x for x in label_k if x != 1]
logger.info('test set: {}/{} one hot, {}/{} multi hot.'.format(len(label_one_hot), len(label_k),len(label_multi_hot),len(label_k)))