BLINK icon indicating copy to clipboard operation
BLINK copied to clipboard

Generating .t7 file for inferencing

Open saswatidana opened this issue 4 years ago • 5 comments

Hello, I am trying to generate .t7 file for a trained model. For that I am running scripts/generate_candidates.py . This python file needs another input file saved_candidates_ids. How do I create this candidate_ids file? Any pointer will help me to run inference code.

saswatidana avatar Jan 19 '21 10:01 saswatidana

@saswatidana Thanks for the reporting. We'll update an instruction on generating candidates shortly.

ledw avatar Jan 27 '21 04:01 ledw

@ledw I'm also curious about how to generate the .t7 file for a trained model. What is the format of the saved_candidates_ids file required in scripts/generate_candidates.py? Could you give me more instructions on this?

JinfengXiao avatar Jun 23 '21 14:06 JinfengXiao

@ledw: Any updates on this?

I was also looking at scripts/generate_candidates.py script and it looks like it expects another pre-generated input file saved_candidates_ids. Digging more into the code reveals that this is a torch tensor of token_idxs of candidates.

Can you please let us know how to generate this file?

This is needed so that we can introduce new candidates from newer versions of Wikipedia.

Thanks!

abhinavkulkarni avatar Jan 05 '22 16:01 abhinavkulkarni

Sorry for not updating on this. The token_idx are generated from BERT tokenizers. The format is batch x vec where vec is the BERT token id vector of input. It's the input "ids" from this function: https://github.com/facebookresearch/BLINK/blob/main/blink/biencoder/data_process.py#L96

ledw-2 avatar Jan 14 '22 03:01 ledw-2

Thanks @ledw-2: I was able to follow your reply and generate embeddings for candidate entities: https://github.com/facebookresearch/BLINK/issues/106#issuecomment-1014507351

abhinavkulkarni avatar Jan 17 '22 13:01 abhinavkulkarni