BLINK icon indicating copy to clipboard operation
BLINK copied to clipboard

How to train on a new knowledge base?

Open amirj opened this issue 3 years ago • 8 comments

It seems that a lot of people asked for training BLINK for a new knowledge base (i.e. a set of entities + descriptions), but unfortunately I couldn't find relevant information.

May I ask you to add just a quick guide here please?

amirj avatar Jun 07 '21 11:06 amirj

I've created a new repository for training bi-encoder models, following this tutorial you can train the model in a newer (or in another language) Wikipedia dump using the BLINK code or following this tutorial

Giovani-Merlin avatar Jan 02 '22 17:01 Giovani-Merlin

@Giovani-Merlin It seems those tutorial links you posted are no longer working, could you repost them?

driscoll42 avatar Apr 16 '22 19:04 driscoll42

@amirj: You can look at this tutorial https://github.com/facebookresearch/BLINK/issues/116

abhinavkulkarni avatar May 13 '22 17:05 abhinavkulkarni

I've created a new repository for training bi-encoder models, following this tutorial you can train the model in a newer (or in another language) Wikipedia dump using the BLINK code or following this tutorial

The link seems to be 404, could u please update to the right link @Giovani-Merlin . Thx a lot~

kongmoumou avatar Nov 20 '22 14:11 kongmoumou

I've created a new repository for training bi-encoder models, following this tutorial you can train the model in a newer (or in another language) Wikipedia dump using the BLINK code or following this tutorial

@Giovani-Merlin : Can you provide access to the mentioned repository ?

viraj-lakshitha avatar Dec 03 '22 09:12 viraj-lakshitha

@Giovani-Merlin I would be also very grateful for the access to your tutorial:)

gromajus avatar Feb 03 '23 10:02 gromajus

@viraj-lakshitha @gromajus @kongmoumou @driscoll42 Hello! Sorry, a bit late, but I needed to make considerable changes in the tutorials/repo as I was unsatisfied with the final results. I've split the repo into two parts:

WBDSM for creating the dataset (for any Wikipedia dump in any language) https://github.com/Giovani-Merlin/wbdsm for creating the dataset

Bet for training bi-encoder models: https://github.com/Giovani-Merlin/bet

The results are fantastic. You can follow this process illustrated here https://github.com/Giovani-Merlin/bet/blob/main/docs/results.md to train a custom model or to benchmark with Zeshel dataset.

If you have any doubts/issues please use the respective repo issues part :) Later on I will improve the tutorials/documentation

Giovani-Merlin avatar Jun 21 '23 16:06 Giovani-Merlin

I won't have time for a few weeks, but I will definitely give this a shot. Thanks for updating it!

driscoll42 avatar Jun 21 '23 17:06 driscoll42