BLINK
BLINK copied to clipboard
Generating .t7 file for inferencing
Hello, I am trying to generate .t7 file for a trained model. For that I am running scripts/generate_candidates.py . This python file needs another input file saved_candidates_ids. How do I create this candidate_ids file? Any pointer will help me to run inference code.
@saswatidana Thanks for the reporting. We'll update an instruction on generating candidates shortly.
@ledw I'm also curious about how to generate the .t7 file for a trained model. What is the format of the saved_candidates_ids file required in scripts/generate_candidates.py? Could you give me more instructions on this?
@ledw: Any updates on this?
I was also looking at scripts/generate_candidates.py
script and it looks like it expects another pre-generated input file saved_candidates_ids
. Digging more into the code reveals that this is a torch tensor of token_idx
s of candidates.
Can you please let us know how to generate this file?
This is needed so that we can introduce new candidates from newer versions of Wikipedia.
Thanks!
Sorry for not updating on this. The token_idx are generated from BERT tokenizers. The format is batch x vec where vec is the BERT token id vector of input. It's the input "ids" from this function: https://github.com/facebookresearch/BLINK/blob/main/blink/biencoder/data_process.py#L96
Thanks @ledw-2: I was able to follow your reply and generate embeddings for candidate entities: https://github.com/facebookresearch/BLINK/issues/106#issuecomment-1014507351