code2vec icon indicating copy to clipboard operation
code2vec copied to clipboard

Preprocess.sh error (division by Zero)

Open ShaliniR11 opened this issue 3 years ago • 14 comments

Hi Dr.Alon, I have my own Java dataset and I am trying to preprocess it with the given Script. I have changed the path directories in the script.I get the following output: shali@DESKTOP-JNLA5ED MINGW64 ~/Documents/Git/code2vec (master) $ sh preprocess.sh preprocess.sh: line 21: C:/Users/shali/Documents/Git/code2vec/data/javadata/train/: Is a directory Extracting paths from validation set... Finished extracting paths from validation set Extracting paths from test set... Finished extracting paths from test set Extracting paths from training set... Finished extracting paths from training set Creating histograms from the training data File: my_dataset.test.raw.txt Traceback (most recent call last): File "C:\Users\shali\Documents\Git\code2vec\preprocess.py", line 133, in num_examples = process_file(file_path=data_file_path, data_file_role=data_role, dataset_name=args.output_name, File "C:\Users\shali\Documents\Git\code2vec\preprocess.py", line 69, in process_file print('Average total contexts: ' + str(float(sum_total) / total)) ZeroDivisionError: float division by zero.

My System Requirements: I am using GitBash on Visual studio code to run the script. OS: Windows 11 Java : java --version openjdk 17.0.3 2022-04-19 OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7) OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing) Python: python --version Python 3.10.4 CUDA: nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_19:00:59_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0 below is my preprocess.sh in txt format preprocess.txt

Please let me know how to proceed further.

ShaliniR11 avatar Jul 18 '22 15:07 ShaliniR11

Hi @ShaliniR11 , Sorry for the delayed response.

Did you notice that you have a "space" token in line 21, right before the path? can you delete this space and see if it helps?

Additionally, do you have subdirectories in the directory C:/Users/shali/Documents/Git/code2vec/data/javadata/train/? The code looks for subdirectories in the training path.

Best, Uri

urialon avatar Jul 21 '22 16:07 urialon

Hi Dr. Alon, I have tried removing the space, the error still exists. I have a single sub directory in each of train,test and val like Screenshot (29)

and inside these subdirectories I have java files like this: Screenshot (30)

ShaliniR11 avatar Jul 24 '22 19:07 ShaliniR11

Can you try running the java process directly, e.g.,:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

?

urialon avatar Jul 25 '22 00:07 urialon

Can you try running the java process directly, e.g.,:

java -cp JavaExtractor/JPredict/target/JavaExtractor-0.0.1-SNAPSHOT.jar JavaExtractor.App --max_path_length 8 --max_path_width 2 --dir JavaExtractor/JPredict/src/main

? This is the console output: image

ShaliniR11 avatar Jul 25 '22 01:07 ShaliniR11

OK so the base java process is running fine, it looks like the problem is in some input/output redirection because of Windows. Can you try running that on a linux machine, or BashOnWindows?

urialon avatar Jul 25 '22 01:07 urialon

Hi @urialon I am facing the same issue. @ShaliniR11 could you solve it? (env_tensor) zunaira@snps-ubo9fomrduif C2V3 % source preprocess.sh Extracting paths from validation set... Finished extracting paths from validation set Extracting paths from test set... Finished extracting paths from test set Extracting paths from training set... Finished extracting paths from training set Creating histograms from the training data File: my_dataset.test.raw.txt Traceback (most recent call last): File "/Users/zunaira/Downloads/C2V3/preprocess.py", line 133, in num_examples = process_file(file_path=data_file_path, data_file_role=data_role, dataset_name=args.output_name, File "/Users/zunaira/Downloads/C2V3/preprocess.py", line 69, in process_file print('Average total contexts: ' + str(float(sum_total) / total)) ZeroDivisionError: float division by zero

zunairazaman2021 avatar Feb 08 '23 10:02 zunairazaman2021

Hi @zunairazaman2021 , Thank you for your interest in our work!

Can you try running the java process directly, as instructed earlier in this thread?

urialon avatar Feb 08 '23 11:02 urialon

@urialon Yes I did Screenshot 2023-02-08 at 11 57 22

zunairazaman2021 avatar Feb 08 '23 11:02 zunairazaman2021

@urialon I tried this as well https://github.com/tech-srl/code2vec/pull/109 but it didn't work Note: I am using Mac M1 chip, and I just changed directories here as: TRAIN_DIR=/Users/zunaira/Downloads/C2V3/tmp/train VAL_DIR=/Users/zunaira/Downloads/C2V3/tmp/validation TEST_DIR=/Users/zunaira/Downloads/C2V3/tmp/test

zunairazaman2021 avatar Feb 08 '23 11:02 zunairazaman2021

Using #109 I get data stored in a tmp directory but still c2v and raw.txt files are empty. :( Screenshot 2023-02-08 at 11 57 22

zunairazaman2021 avatar Feb 08 '23 12:02 zunairazaman2021

Nevermind, Solved it with #109 :) Thanks

zunairazaman2021 avatar Feb 08 '23 12:02 zunairazaman2021

Great to hear :-)

On Wed, Feb 8, 2023 at 07:35 zunaira zaman @.***> wrote:

Nevermind, Solved it with #109 https://github.com/tech-srl/code2vec/pull/109 :) Thanks

— Reply to this email directly, view it on GitHub https://github.com/tech-srl/code2vec/issues/158#issuecomment-1422524820, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMGIFB6ZK2EUUI4XUDTWWOHHVANCNFSM534W3PCQ . You are receiving this because you were mentioned.Message ID: @.***>

urialon avatar Feb 08 '23 12:02 urialon

+1 on https://github.com/tech-srl/code2vec/pull/109 solution, I had the same issue and the PR from gOATiful made it work

Lufedi avatar Mar 13 '23 08:03 Lufedi

Thanks @Lufedi and @zunairazaman2021 , I merged that PR.

urialon avatar Mar 13 '23 12:03 urialon