K-BERT
K-BERT copied to clipboard
Fine-tune on English Corpus
I use Bert (model and tokenizer) to change K-BERT to the English version K-BERT. However, I got poor scores on the classification tasks. If you have K-BERT code of fine-tuning on English Corpus, could you please release it?
For English, please use:
Model: https://share.weiyun.com/5hWivED Vocab: https://share.weiyun.com/5gBxBYD
However, there is no English KG file suitable for K-BERT. What KG do you use?
Hello, @106753004 @autoliuweijie I also want to implement K-BERT on the English corpus. @autoliuweijie the model you mentioned is Google pre-trained Bert on Wiki right, or have you already done some fine-tuning on it? Indeed I use Google Bert (Englis) as the base model and Wikidata (Download link) KG to fine-tune new K-BERT for classification tasks but fail to get good performance.
Actually, I refered to ERNIE and wondered if K-BERT can incorporate Wikidata KG and fine-tune on the different domain datasets such as TACRED, Open Entity. I extracted triples from KG and tokenized them with Bert tokenizer applying the same way to insert into the sentence. Then, followed the same procedure in the paper. Is there any problem with my implementation?
Hello, it seems that the vocab file cannot be downloaded
Hello, it is difficult to download the models if you don't have an account on wechat or QQ. Can you make it accessible without a login? thanks
Hello,
Thanks for your sharing! The Model file can be successfully downloaded. Any chance that you could upload the corresponding Vocab file?
Thank you so much!
Hello,
Thanks for your sharing! The Model file can be successfully downloaded. Any chance that you could upload the corresponding Vocab file?
Thank you so much!
Sorry. I don't know what the reason is that the vocab file we uploaded is considered illegal and has been deleted by the administrator. We are dealing with it and releasing the file as soon as possible.
Hello, it is difficult to download the models if you don't have an account on wechat or QQ. Can you make it accessible without a login? thanks
Sorry, we are looking for other free network disk storage.
Hello,
Thanks for your sharing! The Model file can be successfully downloaded. Any chance that you could upload the corresponding Vocab file?
Thank you so much!
you can get the corresponding vocab file from UER project:
https://github.com/dbiir/UER-py/blob/master/models/google_uncased_en_vocab.txt
It works. Thanks for clarification!
Hey, With regards to english. I extracted some domain specific triples from english dbpedia, so this aspect is covered. I have used a pytorch script to convert cased bert base to the bin file required by uer. I the model loss doesn't decrease however, I see that the a method <add_knowledge_with_vm> starts with word level then breaks down to individual characters. Presumably this is for Chinese character level embeddings, is there a version for english WordPieceEncoding, perhaps BytePairEncoding or even whole word? Many thanks and great work!
Hey, With regards to english. I extracted some domain specific triples from english dbpedia, so this aspect is covered. I have used a pytorch script to convert cased bert base to the bin file required by uer. I the model loss doesn't decrease however, I see that the a method <add_knowledge_with_vm> starts with word level then breaks down to individual characters. Presumably this is for Chinese character level embeddings, is there a version for english WordPieceEncoding, perhaps BytePairEncoding or even whole word? Many thanks and great work!
hello,I am a freshman student in this domain, and I also want to apply this model into English corpus. I wish you could have time to give me some advice for few questions. 1.have you solved the problem that use english WordPieceEncoding? 2.I don't know how to extract domain specific triples from domain english dbpedia(such like the domain in computer science),could you give me some advice.
thank you in advance! I am waiting for you reply.
english dbpedia
Hello, can you share the triples (English) and the Bert model for testing purposes? `Did it finally work?
I use Bert (model and tokenizer) to change K-BERT to the English version K-BERT. However, I got poor scores on the classification tasks. If you have K-BERT code of fine-tuning on English Corpus, could you please release it?
is the english dataset finally work? Thanks very much.
Hello, I am a student working on a textual classification task and I m trying to use K-BERT over a dataset which is purely in English. Though I understand the implementation strategies in K-BERT, I am a little lost on how to implement them over a corpus of data that is purely in English. I see that the vocab file shared by @autoliuweijie is somehow not accessible. It would be great if you could give me a sense of direction on where to start.
Thank you
您好,我已经收到了您的邮件,我会尽快回复!祝您生活愉快!