CodeGen icon indicating copy to clipboard operation
CodeGen copied to clipboard

Code clustering

Open shaileshj2803 opened this issue 3 years ago • 1 comments

Can i use the DOBF model for code clustering to find similar code patterns? If so can you guide which model to begin with or if you have any examples?

shaileshj2803 avatar Oct 26 '21 20:10 shaileshj2803

Hi. You can use our released DOBF models to do that (those models are using the roberta tokenizer and architecture). You can start with the DOBF + DAE version for instance. I would expect it to give similar representations to code with similar semantics (since they would be likely to have similar variable names). https://github.com/facebookresearch/CodeGen/blob/main/docs/dobf.md#release

baptisteroziere avatar Nov 02 '21 18:11 baptisteroziere