starcoder icon indicating copy to clipboard operation
starcoder copied to clipboard

Generating Embeddings of Code Tokens using StarCoder

Open code2graph opened this issue 2 years ago • 1 comments

I am exploring the possibility of using StarCoder to generate embeddings for code tokens and would like to know if this is feasible with the current implementation.

Questions:

  1. Is it possible to use StarCoder to generate embeddings of code tokens?
  2. If yes, how should we configure and use StarCoder to make it usable for generating embeddings of code tokens?

code2graph avatar Sep 23 '23 00:09 code2graph

Hi, you can take the last hidden layer of the model as embeddings, however it might be better to use an encoder for the embeddings, we have trained a BERT-like code model called StarEncoder which you can try https://huggingface.co/bigcode/starencoder

loubnabnl avatar Nov 15 '23 15:11 loubnabnl