kgt5 icon indicating copy to clipboard operation
kgt5 copied to clipboard

Question about KGC datasets

Open PlusRoss opened this issue 2 years ago • 5 comments

Hi,

Thanks for the great work! Could you also share the KGC datasets fb15k-237 and WN18RR (the same format as wikidata5m)? BTW, I also see the dataset codex-m in the shared data but did not find results in your paper. Did you also conduct experiments on codex-m?

PlusRoss avatar Jul 19 '22 21:07 PlusRoss

Hi Apoorv, Thanks for the great work! I can't find the entity_strings.txt from data/wikidata5m/entity_strings.txt but the code need it.How can I get it?

kkydp avatar Jul 29 '22 06:07 kkydp

Hi thanks for your interest.

@PlusRoss We did some preliminary experiments but unfortunately we don't have any final numbers to share

@2682989487 You can run the script https://github.com/apoorvumang/kgt5/blob/main/data/get_unique_entities.py to get the entity strigns

apoorvumang avatar Aug 10 '22 07:08 apoorvumang

Hi Aporv, I found that after running get_unique_entities. py, there will be some samples with three | in the training data, causing the program to fail. This problem was mentioned in question 2 #2 ,Has this problem been solved?

github-cqy avatar Oct 06 '22 06:10 github-cqy

Hi, please see https://github.com/apoorvumang/kgt5/issues/18#issuecomment-1227189777 for the updated mappings from wikidata ID to text. You can use these to reconstruct the dataset. I would recommend using these.

I will update the readme soon to point to these mappings instead of our earlier kgc dataset. Let me know if this works for you or not

apoorvumang avatar Oct 06 '22 14:10 apoorvumang

In #2 , I encountered a solved problem, that is, I could not get the entity string from wikidata5m. I seem to be using an old dataset. I will try to update the mapping from wikidata ID to text, thank you for your answer, and look forward to the update of the readme file.

github-cqy avatar Oct 07 '22 05:10 github-cqy