vecto icon indicating copy to clipboard operation
vecto copied to clipboard

Fail to reproduce the result in "Subcharacter Information in Japanese Embeddings: When Is It Worth It?"

Open yuanzhiKe opened this issue 6 years ago • 4 comments

Hi.

I tried to use this project to reproduce the results in

Karpinska, Marzena, et al. "Subcharacter Information in Japanese Embeddings: When Is It Worth It?." Proceedings of the Workshop on the Relevance of Linguistic Structure in Neural Architectures for NLP. 2018.

I used the openly shared JP word vectors (without character and subcharacter).

However I found that for each subset the result is much lower than the paper. (Method = 3CosAdd)

My code to see the result is as follows:

jBATS_folder = '../analogy_results/word_analogy/JBATS_1.0/'
result_file = os.path.join(jBATS_folder, '19.05.29_21.14.41', 'results.json')
with open(result_file, 'r') as f:
    data = json.load(f)
for subset in data:
    print(len(subset['details']))
    print(subset['result'])

The result is as follows:

2450
{'cnt_questions_correct': 1090, 'cnt_questions_total': 2450, 'accuracy': 0.4448979591836735}
2550
{'cnt_questions_correct': 2558, 'cnt_questions_total': 5000, 'accuracy': 0.5116}
3192
{'cnt_questions_correct': 3537, 'cnt_questions_total': 8192, 'accuracy': 0.4317626953125}
2450
{'cnt_questions_correct': 5636, 'cnt_questions_total': 10642, 'accuracy': 0.529599699304642}
2450
{'cnt_questions_correct': 6832, 'cnt_questions_total': 13092, 'accuracy': 0.5218454017720745}
3192
{'cnt_questions_correct': 7826, 'cnt_questions_total': 16284, 'accuracy': 0.48059444853844263}
2450
{'cnt_questions_correct': 8885, 'cnt_questions_total': 18734, 'accuracy': 0.47427137824276716}
2450
{'cnt_questions_correct': 10426, 'cnt_questions_total': 21184, 'accuracy': 0.49216389728096677}
2450
{'cnt_questions_correct': 11819, 'cnt_questions_total': 23634, 'accuracy': 0.500084623847}
2450
{'cnt_questions_correct': 12853, 'cnt_questions_total': 26084, 'accuracy': 0.49275417880693145}
2450
{'cnt_questions_correct': 13217, 'cnt_questions_total': 28534, 'accuracy': 0.46320179435059927}
2450
{'cnt_questions_correct': 14000, 'cnt_questions_total': 30984, 'accuracy': 0.4518461141234185}
2450
{'cnt_questions_correct': 14365, 'cnt_questions_total': 33434, 'accuracy': 0.42965244960220134}
2450
{'cnt_questions_correct': 14621, 'cnt_questions_total': 35884, 'accuracy': 0.40745178909820534}
2450
{'cnt_questions_correct': 14955, 'cnt_questions_total': 38334, 'accuracy': 0.3901236500234779}
2652
{'cnt_questions_correct': 15478, 'cnt_questions_total': 40986, 'accuracy': 0.3776411457570878}
2450
{'cnt_questions_correct': 15695, 'cnt_questions_total': 43436, 'accuracy': 0.36133621880467814}
2450
{'cnt_questions_correct': 15926, 'cnt_questions_total': 45886, 'accuracy': 0.347077539990411}
2450
{'cnt_questions_correct': 16339, 'cnt_questions_total': 48336, 'accuracy': 0.33802962595167163}
2450
{'cnt_questions_correct': 17286, 'cnt_questions_total': 50786, 'accuracy': 0.3403693931398417}
2450
{'cnt_questions_correct': 17980, 'cnt_questions_total': 53236, 'accuracy': 0.33774137801487714}
2352
{'cnt_questions_correct': 18442, 'cnt_questions_total': 55588, 'accuracy': 0.3317622508455062}
2162
{'cnt_questions_correct': 19739, 'cnt_questions_total': 57750, 'accuracy': 0.3418008658008658}
2450
{'cnt_questions_correct': 19949, 'cnt_questions_total': 60200, 'accuracy': 0.33137873754152825}
2450
{'cnt_questions_correct': 20158, 'cnt_questions_total': 62650, 'accuracy': 0.321755786113328}
2450
{'cnt_questions_correct': 20252, 'cnt_questions_total': 65100, 'accuracy': 0.3110906298003072}
2450
{'cnt_questions_correct': 20253, 'cnt_questions_total': 67550, 'accuracy': 0.2998223538119911}
2450
{'cnt_questions_correct': 20472, 'cnt_questions_total': 70000, 'accuracy': 0.29245714285714286}
2450
{'cnt_questions_correct': 20640, 'cnt_questions_total': 72450, 'accuracy': 0.28488612836438926}
2550
{'cnt_questions_correct': 20657, 'cnt_questions_total': 75000, 'accuracy': 0.27542666666666665}
2450
{'cnt_questions_correct': 20741, 'cnt_questions_total': 77450, 'accuracy': 0.26779857972885734}
2450
{'cnt_questions_correct': 20869, 'cnt_questions_total': 79900, 'accuracy': 0.26118898623279097}
2450
{'cnt_questions_correct': 21030, 'cnt_questions_total': 82350, 'accuracy': 0.2553734061930783}
2450
{'cnt_questions_correct': 21086, 'cnt_questions_total': 84800, 'accuracy': 0.24865566037735848}
2450
{'cnt_questions_correct': 21170, 'cnt_questions_total': 87250, 'accuracy': 0.24263610315186246}
2450
{'cnt_questions_correct': 21242, 'cnt_questions_total': 89700, 'accuracy': 0.23681159420289855}
2450
{'cnt_questions_correct': 21375, 'cnt_questions_total': 92150, 'accuracy': 0.23195876288659795}
2450
{'cnt_questions_correct': 21636, 'cnt_questions_total': 94600, 'accuracy': 0.22871035940803383}
2450
{'cnt_questions_correct': 21859, 'cnt_questions_total': 97050, 'accuracy': 0.2252344152498712}
2450
{'cnt_questions_correct': 22208, 'cnt_questions_total': 99500, 'accuracy': 0.2231959798994975}

The command to run the task is,

python -m vecto benchmark analogy Karpinska/word/vectors Karpinska/JBATS_1.0 --path_out analogy_results/ --method 3CosAdd

The embeddings and jBATS set are from http://vecto.space/projects/jBATS/

Could you please tell me what is the result in the outputfile and how to get the accuracy on each subset correctly?

Thank you.

yuanzhiKe avatar May 31 '19 07:05 yuanzhiKe

I guess 2450 is 50*49 for the number of a:b=c:d pairs. However, I found that the result is much lower than the paper with the opened word embeddings. Is the program wrong?

yuanzhiKe avatar May 31 '19 07:05 yuanzhiKe

I changed the accumulated score to the respective score for each subset and plot the result.

code:

accuracy_reports_c2 = []
pre1=0
pre2=0
for data_set in data:
    corrects = data_set['result']['cnt_questions_correct'] - pre1
    total = data_set['result']['cnt_questions_total'] - pre2
    pre1 = data_set['result']['cnt_questions_correct']
    pre2 = data_set['result']['cnt_questions_total']
    accuracy_c2 = corrects/total
    print(corrects, total, accuracy_c2)
    accuracy_reports_c2.append(accuracy_c2)
accuracy_reports_new_c2 = accuracy_reports_c2[10:20] + accuracy_reports_c2[20:30] + accuracy_reports_c2[0:10] + accuracy_reports_c2[30:40]

subcat_ids=[]
for data_set in data:
    subcat = (data_set['experiment_setup']['subcategory'])
    print(subcat)
    subcat_id = subcat.split(' ')[0]
    subcat_ids.append(subcat_id)
subcat_ids_new = subcat_ids[10:20] + subcat_ids[20:30] + subcat_ids[0:10] + subcat_ids[30:40]

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 10))
plt.ylim(top=1)
plt.xticks(range(len(accuracy_reports_new_c2)), subcat_ids_new)
plt.bar(range(len(accuracy_reports_new_c2)), accuracy_reports_new_c2)

Figure:

image

yuanzhiKe avatar May 31 '19 08:05 yuanzhiKe

Hi, I'm attending NAACL 2019 now, but will try to check your issue soon. Aren't you coming to this conference yourself by any chance? If so, you can catch me here.

undertherain avatar Jun 02 '19 16:06 undertherain

Hi,

1, So sorry that there is indeed a bug in analogy test code, where we accumulated the scores. We have already fixed it, please try the latest code. 2, For the accuracy of each category, you can just count the correct examples and divide by the total examples. 3, The embedding we published is trained on both wiki and mainichi, so the results are different than the paper reported. I just re-tested the embedding (SG) using the latest code (LRCos), and gets 1_inflectional_morphology 0.8008130081300813 2_derivational_morphology 0.47478991596638653 3_encyclopedic_semantics 0.43768115942028984 4_lexicographic_semantics 0.14549653579676675, which is higher than the paper in inf and der category. However, let me know if you need the embeddings trained only on wiki.

libofang avatar Jun 04 '19 07:06 libofang