obfuscated-code2vec
obfuscated-code2vec copied to clipboard
Generating embeddings of source code
Hello,
Can you please explain how to use your model to generate embeddings for Python and for Java separately?
Thanks.
Hi @Avra2 ,
You'll want to follow the usage instructions for the dataset pipeline.
This will only generate embeddings for Java files. To embed Python files, you'll need a Python extractor. The code2vec authors have referenced a python extractor made by JetBrains which might be of use: Link.
Let me know if you get stuck on generating embeddings for Java. Unfortunately Python isn't currently supported so you'll have to do some hacking to get that working (e.g., by using the python extractor linked above and updating the path here
Thanks
@basedrhys.
Thank you. It has been a while, but I tried code2vec and code2seq. Code2vec did not work as astminer tool does not give all files needed for code2vec to run as dict
file is missing and I have to construct it by myself. So, for Java embeddings please, I have a dataset of 20k files, if I ran code2vec, I would get a file name prediction for each file, is that correct please? If that is the case, I am looking for a context vector prediction representing the whole file and not just single method name. Hopefully you understand my question and thanks in advance.