Extract embedding while using parameter "extraction_strategy="per_token""

Open JiangYanting opened this issue 3 years ago • 0 comments

Question Hello! I used the script "embeddings_extraction.py", and I input the sentence as below: basic_texts = [ {"text": "apple is delicious fruit"} ] model = Inferencer.load(lang_model, task_type="embeddings", gpu=use_gpu, batch_size=batch_size, extraction_strategy="per_token", extraction_layer=-1, num_processes=0) result = model.inference_from_dicts(dicts=basic_texts)

I set the parameter "extraction_strategy="per_token", and the printed len(result[0]["vec"]) is 256. result[0]["vec"][0] is a 768-dimensional vector. And I am wondering this 768 vector result[0]["vec"][0] is the representation of the First word "apple", or the representation of "[CLS]" token? Thank you very much!

Additional context Add any other context or screenshots about the question (optional).

Aug 09 '22 05:08 JiangYanting