code2vec icon indicating copy to clipboard operation
code2vec copied to clipboard

How to create code2vec input

Open messiGao opened this issue 1 year ago • 9 comments

I use command like “{java -cp JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir test.java >file.txt }“ ,then use ”{python3 code2vec.py --load models/java14_model/saved_model_iter8.release --test file.txt}“,but get error “ {return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict, tensorflow.python.framework.errors_impl.InvalidArgumentError: Expect 201 fields but have 4 in record [[{{node IteratorGetNext}}]] }”.

messiGao avatar Nov 16 '23 11:11 messiGao

Hi @messiGao , Thank you for your interest in our work.

I think there is a confusion, because the exception that is raised is coming from TensorFlow, while the java command that you mentioned does not involve TensorFlow at all.

May I also ask what kinds of tasks are you looking into? Maybe I can recommend a newer model.

Best, Uri

urialon avatar Nov 16 '23 11:11 urialon

I want to use the “--test” command to export <TEST_FILE>.vectors,but I don't know what kind of TEST_FILE is correct。when i ask gpt-4, the answer is use the JavaExtractor to convert my test.java to test.txt。

messiGao avatar Nov 16 '23 12:11 messiGao

Additionally,My aim is to store a Java codebase in a vector database to run similarity searches and retrieve code files from the db relevant to my query.

messiGao avatar Nov 16 '23 12:11 messiGao

Hi @messiGao ,

Please see https://github.com/neulab/code-bert-score You don't need the approach itself, but it contains Huggingface models, and one specifically for java called neulab/codebert-java.

This will allow you to use the Huggingface library with that model and a BERT-like framework.

Best, Uri

urialon avatar Nov 16 '23 15:11 urialon

I have a similar dilemma with regards to creating embeddings of csharp code using a code2vec model I have trained. As @messiGao mentioned, I want to use the "--test" command to create <TEST FILES>.vectors file as mentioned in the repo but when i execute the command, it gives the following error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Expect 201 fields but have 2 in record
         [[node IteratorGetNext (defined at /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]]```

asyed79gatech avatar Feb 22 '24 09:02 asyed79gatech

Hi @asyed79gatech , Thank you for your interest in our work.

I believe that you haven't run the preprocess.sh script on the data.

However in general, I recommend using the newer https://github.com/neulab/code-bert-score project. It is based on Huggingface, which is actively maintained.

Best, Uri

urialon avatar Feb 22 '24 13:02 urialon

Hi @urialon

Thanks for your prompt response. I thought we only needed to run the preprocess.sh script while training the code2vec model. Right now, I already have a trained model released and want it to generate embeddings for vector store.

asyed79gatech avatar Feb 22 '24 13:02 asyed79gatech

我使用像“{java -cp JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir test.java >file.txt }”这样的命令,然后使用“{python3 code2vec.py --load models/java14_model/saved_model_iter8.release --test file.txt}”,但出现错误“ {return tf_session。TF_SessionRun_wrapper(self._session、选项、feed_dict、tensorflow.python.framework.errors_impl。InvalidArgumentError:预期有 201 个字段,但记录中有 4 个字段 [[{{node IteratorGetNext}}]] }“。

Hello, have you resolved your issue? How can Java source code be converted into the input format required by code2vec?

XuPing1234 avatar Jun 29 '24 04:06 XuPing1234

我使用像“{java -cp JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir test.java >file.txt }”这样的命令,然后使用“{python3 code2vec .py --load models/java14_model/saved_model_iter8.release --test file.txt}”,但出现错误“ {return tf_session。TF_SessionRun_wrapper(self._session、选项、feed_dict、tensorflow.python.framework.errors_impl。InvalidArgumentError:预期有 201 个字段,但记录有 4 个字段 [[{{node IteratorGetNext}}]] }“。

您好,您的问题解决了吗?Java 源代码如何转换成 code2vec 所需的输入格式?

hello, I encountered the same issue. Have you resolved it?

zhaojialinnn avatar Aug 20 '24 09:08 zhaojialinnn