ChengyuBERT icon indicating copy to clipboard operation
ChengyuBERT copied to clipboard

embedding training config file

Open starry-y opened this issue 1 year ago • 4 comments

Thanks for your work!

I can not find this file train-embeddings-base-1gpu.json mentioned in ReadMe.md, but found bert-wwm-ext_literature file. Does the bert-wwm-ext_literature file replace the former file?

Thanks a lot!

starry-y avatar Mar 05 '23 03:03 starry-y

Hi, the basic difference of configurations are db paths. For embeddings, we use literature data rather than official data as the training data.

Yes, please use the ext_literature as the configuration file.

Vimos avatar Mar 05 '23 12:03 Vimos

Ok, thanks for your reply. I have replace the config file in the terminal.

And I have another question.

In the evaluation stage, what is pretrained/Chinese-word-vector/embeddings refering to ?

starry-y avatar Mar 05 '23 14:03 starry-y

And I could not find chengyu_synonym_dict in train_embedding.py ...

Sorry for bothering you, and waiting for your reply.

starry-y avatar Mar 06 '23 02:03 starry-y

Please refer to https://github.com/VisualJoyce/ChengyuBERT#learning-and-evaluating-chinese-idiom-embeddings

This is a different paper focusing on embedding learning and evaluation. The data has been shared online.

Vimos avatar Mar 06 '23 06:03 Vimos