obfuscated-code2vec icon indicating copy to clipboard operation
obfuscated-code2vec copied to clipboard

Generating embeddings of source code

Open Avv22 opened this issue 3 years ago • 2 comments

Hello,

Can you please explain how to use your model to generate embeddings for Python and for Java separately?

Thanks.

Avv22 avatar Oct 13 '21 14:10 Avv22

Hi @Avra2 ,

You'll want to follow the usage instructions for the dataset pipeline.

This will only generate embeddings for Java files. To embed Python files, you'll need a Python extractor. The code2vec authors have referenced a python extractor made by JetBrains which might be of use: Link.

Let me know if you get stuck on generating embeddings for Java. Unfortunately Python isn't currently supported so you'll have to do some hacking to get that working (e.g., by using the python extractor linked above and updating the path here

Thanks

basedrhys avatar Oct 26 '21 19:10 basedrhys

@basedrhys.

Thank you. It has been a while, but I tried code2vec and code2seq. Code2vec did not work as astminer tool does not give all files needed for code2vec to run as dict file is missing and I have to construct it by myself. So, for Java embeddings please, I have a dataset of 20k files, if I ran code2vec, I would get a file name prediction for each file, is that correct please? If that is the case, I am looking for a context vector prediction representing the whole file and not just single method name. Hopefully you understand my question and thanks in advance.

Avv22 avatar Nov 24 '21 21:11 Avv22