WikiEntVec icon indicating copy to clipboard operation
WikiEntVec copied to clipboard

entity_vectors in English from wikipedia

Open ypc-stu opened this issue 6 years ago • 5 comments

thanks for your sharing Is there an entity_vectors in English from wikipedia? https://github.com/singletongue/WikiEntVec/releases

ypc-stu avatar May 06 '19 03:05 ypc-stu

it is provided in Japanese, I want to know if there is English.

ypc-stu avatar May 06 '19 03:05 ypc-stu

We haven't released English version of pretrained word/entity vectors. You will need to download an English dump and train vectors on it manually.

singletongue avatar May 06 '19 06:05 singletongue

Thanks you for your patient answer. I'm going to download English dump from Wikipedia --https://dumps.wikimedia.org/other/cirrussearch/20190225/ I'm not sure if my method is right, or where I should download English dump. Look forward to your reply

ypc-stu avatar May 06 '19 08:05 ypc-stu

You will need a dump file named as **wiki-YYYYMMDD-cirrussearch-content.json.gz, where ** specifies a language. So, your correct choice will be enwiki-20190225-cirrussearch-content.json.gz. Please keep in mind that the English dump file is very large (~27GB); it will take several hours to complete downloading.

After the file is downloaded, follow the steps for manual training described in README.md.

singletongue avatar May 06 '19 22:05 singletongue

I got it ,thanks for your answers,gook Luck

ypc-stu avatar May 07 '19 08:05 ypc-stu