Biomedical-Entity-Linking icon indicating copy to clipboard operation
Biomedical-Entity-Linking copied to clipboard

about candidate generation

Open acadTags opened this issue 3 years ago • 2 comments

Hi Lihu, this is very good work.

I have some questions when I try to adapt it to another dataset.

The candidate generation seems not very straightforward, like how to get the files test_candidates.txt and training_aligned_cos_with_mention_candidate.txt. I have tried to look at the generate_candidate.py, but it seems not very easy to be applied for me.

While we can try to implement to produce these files based on the descriptions in the paper, it would be more helpful if some more scripts to run candidate generations are available, in case you have them. Or would be great to know more about how to generate candidates if I missed something. Thanks.

Best regards, A

acadTags avatar Jul 18 '22 19:07 acadTags

Hi Lihu, this is very good work.

I have some questions when I try to adapt it to another dataset.

The candidate generation seems not very straightforward, like how to get the files test_candidates.txt and training_aligned_cos_with_mention_candidate.txt. I have tried to look at the generate_candidate.py, but it seems not very easy to be applied for me.

While we can try to implement to produce these files based on the descriptions in the paper, it would be more helpful if some more scripts to run candidate generations are available, in case you have them. Or would be great to know more about how to generate candidates if I missed something. Thanks.

Best regards, A

Hi,

For understanding, I have added a simplified version of the python file for candidate generation source/candidate_sample.py. You can apply this script to your own dataset in order to get the *_candidate.txt.

The core function here is the find_topk_candidates(mention, entity_set, emb_matrix, topk), where the entity set is the reference KB that contains surface forms of entities, and emb_matrix is the pre-trained word embeddings.

Note that if there is an exact match for a mention, the other candidates can be filtered out, although I don't mention this procedure in the script.

Hope it helps, Lihu

tigerchen52 avatar Jul 24 '22 15:07 tigerchen52

Thanks! Best regards, A

acadTags avatar Aug 25 '22 10:08 acadTags