ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

[Help] 如何获取embedding层表达

Open 1991Troy opened this issue 1 year ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

chatgpt可以通过 openai.Embedding.create(model=model, input=text)获取embedding层表达,请问通过huggingface调用GLM模型应该如何获取embedding层表达? 后续如继续开发chatpdf的话,embedding信息还挺有用的。 谢谢

Expected Behavior

No response

Steps To Reproduce

help

Environment

- OS:Ubuntu 20.04
- Python:3.8
- Transformers: 4.26.1
- PyTorch:1.12
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No

1991Troy avatar Mar 17 '23 06:03 1991Troy

同问

huangjiaheng avatar Mar 20 '23 06:03 huangjiaheng

同问

zhongtao93 avatar Mar 21 '23 03:03 zhongtao93

同问

stallboy avatar Mar 22 '23 04:03 stallboy

同问

zlszhonglongshen avatar Mar 23 '23 04:03 zlszhonglongshen

我这边尝试了一下,直接用的话感觉效果并不是很好:ChatGLM-text-embedding

georgechen1827 avatar Mar 30 '23 07:03 georgechen1827

暂时没有直接获取Embedding的API。 目前可以通过设置output_hidden_states=True获取隐层表示,可参考以下代码:

def get_hidden_states(
    text: str, model: PreTrainedModel, tokenizer: PreTrainedModel
) -> Optional[Tuple[torch.Tensor]]:
    model = model.eval()
    inputs = tokenizer([text], return_tensors='pt').to(model.device)
    out = model(**inputs, output_hidden_states=True)
    return out.hidden_states

zhangch9 avatar Aug 16 '23 03:08 zhangch9