CodeGen
CodeGen copied to clipboard
Code clustering
Can i use the DOBF model for code clustering to find similar code patterns? If so can you guide which model to begin with or if you have any examples?
Hi. You can use our released DOBF models to do that (those models are using the roberta tokenizer and architecture). You can start with the DOBF + DAE version for instance. I would expect it to give similar representations to code with similar semantics (since they would be likely to have similar variable names). https://github.com/facebookresearch/CodeGen/blob/main/docs/dobf.md#release