kgt5 pretained-Chinese data

Hello, this work is terrific, and I am happy to find your work. But I have some questions. Could I use your model for Chinese Triples data? If it is could, should I train your model again?

Apr 17 '22 03:04 Garethyu

Hi, thanks for your interest!

Yes you could use the model but you would have to train it again. Currently only English/Latin characters were used in the pretraining. You would also probably need to use a different tokenizer, which means training from scratch.

Apr 17 '22 05:04 apoorvumang

Thanks a lot! How could I know the details of your pretrained model? What I wanted is runs your model with Chinese triples.

Apr 17 '22 08:04 Garethyu

You could try the following:

Convert your KG into verbalized format. This means that for each triple e.g. (obama, president of, USA) in train KG, make 2 lines as follows: a. "predict tail: obama | president of\tUSA" b. "predict head: USA | president of\tobama"

where '\t' is the tab symbol. Put all this in train.txt (make valid, test.txt similarly). Put all the .txt files in data/your_dataset_name folder

Train using the command provided, with dataset as your_dataset_name

Apr 17 '22 11:04 apoorvumang

How could I know the details of your pretrained model?

What specific details are you looking for that are not there in the paper or on https://huggingface.co/apoorvumang/kgt5-base-wikikg90mv2 ?

Apr 17 '22 11:04 apoorvumang

You could try the following:

Convert your KG into verbalized format. This means that for each triple e.g. (obama, president of, USA) in train KG, make 2 lines as follows: a. "predict tail: obama | president of\tUSA" b. "predict head: USA | president of\tobama"

where '\t' is the tab symbol. Put all this in train.txt (make valid, test.txt similarly). Put all the .txt files in data/your_dataset_name folder

Train using the command provided, with dataset as your_dataset_name

Thanks for your answering! I will make my data to this format.

Apr 17 '22 12:04 Garethyu

How could I know the details of your pretrained model?

What specific details are you looking for that are not there in the paper or on https://huggingface.co/apoorvumang/kgt5-base-wikikg90mv2 ?

Actually, I am a rookie. What I want to know is that if I use Chinese triples, I need to train it again, but what should I to change?only change kgt5/data into my data? And then train your model again?

Apr 17 '22 12:04 Garethyu

You would also need to change the tokenizer. The default tokenizer of T5 might not be good enough for chinese (I'm not sure though).

Apr 17 '22 12:04 apoorvumang

You would also need to change the tokenizer. The default tokenizer of T5 might not be good enough for chinese (I'm not sure though).

Ok, thanks a lot.

Apr 17 '22 16:04 Garethyu

What is the entity_strings.txt file? What is its use and how are we mapping it with entities? In the paper, it is talked about using entities and relations description for training. Will be not use pkl file of relations for WikiKG90Mv2?

May 27 '22 06:05 ankush9812

您还需要更改标记生成器。T5 的默认分词器对于中文来说可能不够好（但我不确定）。

好的，非常感谢。

请问您进行了中文三元组的训练了吗？

Oct 17 '23 12:10 px6927

kgt5 kgt5 copied to clipboard

pretained-Chinese data

kgt5
kgt5 copied to clipboard