biobert-pytorch icon indicating copy to clipboard operation
biobert-pytorch copied to clipboard

How to get embedding without deleting duplicates?

Open WangyuchenCS opened this issue 2 years ago • 3 comments

Hi, I wonder How to get embedding without deleting duplicates? as I found that the output .h5 file return a result that did not match the input .txt length, and it dropped duplicates.

WangyuchenCS avatar Nov 02 '22 04:11 WangyuchenCS

Hi @WangyuchenCS

Could you try --keep_text_order True when running the script?

mjeensung avatar Nov 02 '22 04:11 mjeensung

Thanks a lot , but it cause an error that has not occurred before image

WangyuchenCS avatar Nov 02 '22 05:11 WangyuchenCS

Thanks for reporting the error.

Could you replace line 13--16 as follows?

entity_id = str(i)
entity_name = f[entity_id].attrs['text']
embedding = f[entity_id]['embedding'][:]

mjeensung avatar Nov 02 '22 08:11 mjeensung