CodeGen icon indicating copy to clipboard operation
CodeGen copied to clipboard

UncompletedJobError: No output/error stream produced

Open sushantkumar007007 opened this issue 2 years ago • 1 comments

I am running the CodeGen using the test repository (https://github.com/facebookresearch/CodeGen/tree/main/data/test_dataset) for obfuscation mode run codegen_sources/preprocessing/preprocess.py data/python_test --mode obfuscation --local True --local_parallelism 4 --langs python --train_splits 1 --tokenization_timeout 400 --bpe_timeout 220 --train_bpe_timeout 400 --bpe_mode fast --fastbpe_use_vocab True --fastbpe_vocab_path data/bpe/cpp-java-python/vocab --fastbpe_code_path data/bpe/cpp-java-python/codes --keep_comments False --ncodes 4000 --percent_test_valid 2

I am getting the following error,

`INFO - 05/04/22 15:56:33 - 0:00:00 - Dataset pipeline for /home/sushantk/anaconda3/codeGen/data/python_test


INFO - 05/04/22 15:56:33 - 0:00:00 - ========== Extract and Tokenize ===========
INFO - 05/04/22 15:56:33 - 0:00:00 - Using 4 processors.
INFO - 05/04/22 15:56:33 - 0:00:00 - python: tokenizing and extracting parallel functions in 1 json files ...
INFO - 05/04/22 15:56:33 - 0:00:00 - Number of lines to process: 50
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content Missing parentheses in call to 'print'. Did you mean print('\nThe best BASE85 based alphabet for your setup is: %s' \)? (<unknown>, line 1673) 
                                        
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment 
                                        
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment 
                                        
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content Missing parentheses in call to 'print'. Did you mean print("Press control+C to stop and show the summary")? (<unknown>, line 43) 
                                        
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment 
                                        
WARNING - 05/04/22 15:56:33 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment 
                                        
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content Missing parentheses in call to 'print'. Did you mean print("permantly remove file ", file)? (<unknown>, line 374) 
                                        
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content local variable 'mangledName' referenced before assignment 
                                        
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content invalid syntax (<unknown>, line 426) 
                                        
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content Missing parentheses in call to 'print'. Did you mean print("\nBEGIN - expecting GEOS_ERROR)? (<unknown>, line 135) 
                                        
WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content invalid syntax (<unknown>, line 92) 
                                        WARNING - 05/04/22 15:56:34 - 0:00:01 - Error obfuscating content invalid syntax (<unknown>, line 62) 
                                        

100%|██████████| 50/50 [00:00<00:00, 3385.62it/s]
INFO - 05/04/22 15:56:34 - 0:00:01 - Time elapsed: 0.95
WARNING - 05/04/22 15:56:34 - 0:00:01 - Tokenization of /home/sushantk/anaconda3/codeGen/data/python_test/python.001 (1).json.gz:12 errors out of 50 lines(24.00%)
WARNING - 05/04/22 15:56:34 - 0:00:01 - Tokenization of /home/sushantk/anaconda3/codeGen/data/python_test/python.001 (1).json.gz:3 filtered examples in 50 lines(6.00%)


INFO - 05/04/22 15:56:34 - 0:00:01 - ========== Deduplicate and Split ===========
INFO - 05/04/22 15:56:34 - 0:00:02 - all files python.*[0-9].obfuscated.tok regrouped in /home/sushantk/anaconda3/codeGen/data/python_test/python.all.obfuscated.tok .
INFO - 05/04/22 15:56:34 - 0:00:02 - all files python.*[0-9].dictionary.tok regrouped in /home/sushantk/anaconda3/codeGen/data/python_test/python.all.dictionary.tok .
INFO - 05/04/22 15:56:34 - 0:00:02 - shuffling 2 files parallely: python.all.obfuscated.tok, python.all.dictionary.tok
INFO - 05/04/22 15:56:34 - 0:00:02 - python: Deduplication on 'obfuscated' and propagated on other suffixes.
INFO - 05/04/22 15:56:34 - 0:00:02 - python: Duplicated lines for obfuscated: 0 / 35
INFO - 05/04/22 15:56:34 - 0:00:02 - python: valid.obfuscated -> 0 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: test.obfuscated -> 0 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: train.obfuscated.0 -> 35 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: Duplicated lines for dictionary: 0 / 35
INFO - 05/04/22 15:56:35 - 0:00:02 - python: valid.dictionary -> 0 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: test.dictionary -> 0 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - python: train.dictionary.0 -> 35 lines
INFO - 05/04/22 15:56:35 - 0:00:02 - Sucessfully regroup, deduplicate and split tokenized data into a train/valid/test sets.


INFO - 05/04/22 15:56:35 - 0:00:02 - ========== Learn BPE ===========
INFO - 05/04/22 15:56:35 - 0:00:02 - No need to train bpe codes, already trained. Codes: data/bpe/cpp-java-python/codes


INFO - 05/04/22 15:56:35 - 0:00:02 - ========== Apply BPE ===========
INFO - 05/04/22 15:56:35 - 0:00:02 - Applying BPE on /home/sushantk/anaconda3/codeGen/data/python_test/python.train.dictionary.0.tok ...
INFO - 05/04/22 15:56:35 - 0:00:02 - Applying BPE on /home/sushantk/anaconda3/codeGen/data/python_test/python.train.obfuscated.0.tok ...
WARNING - 05/04/22 15:56:35 - 0:00:02 - /home/sushantk/anaconda3/codeGen/data/python_test/python.valid.dictionary.tok is not a valid file, cannot to apply BPE on it.
WARNING - 05/04/22 15:56:35 - 0:00:02 - /home/sushantk/anaconda3/codeGen/data/python_test/python.valid.obfuscated.tok is not a valid file, cannot to apply BPE on it.
WARNING - 05/04/22 15:56:35 - 0:00:02 - /home/sushantk/anaconda3/codeGen/data/python_test/python.test.dictionary.tok is not a valid file, cannot to apply BPE on it.
WARNING - 05/04/22 15:56:35 - 0:00:02 - /home/sushantk/anaconda3/codeGen/data/python_test/python.test.obfuscated.tok is not a valid file, cannot to apply BPE on it.
---------------------------------------------------------------------------
UncompletedJobError                       Traceback (most recent call last)
~/anaconda3/codeGen/codegen_sources/preprocessing/preprocess.py in <module>()
    212     args.input_path = os.path.abspath(args.input_path)
    213     multiprocessing.set_start_method("fork")
--> 214     preprocess(args)

~/anaconda3/codeGen/codegen_sources/preprocessing/preprocess.py in preprocess(args)
    103 
    104     dataset.apply_bpe(
--> 105         executor=cluster_apply_bpe, local_parallelism=args.local_parallelism
    106     )
    107     dataset.get_vocab(executor=cluster_train_bpe)

~/anaconda3/codeGen/codegen_sources/preprocessing/dataset_modes/obfuscation_mode.py in apply_bpe(self, executor, local_parallelism)
    127         _bpe_ext = self.bpe.ext
    128         self.bpe.ext += TMP_EXT
--> 129         super().apply_bpe(executor)
    130         self.bpe.ext = _bpe_ext
    131         # restore BPE on obfuscation special tokens

~/anaconda3/codeGen/codegen_sources/preprocessing/dataset_modes/dataset_mode.py in apply_bpe(self, executor, local_parallelism)
    615                 jobs.append(job)
    616         for job in jobs:
--> 617             job.result()
    618         logger.info("BPE done.")
    619         # logger.info("Regrouping BPE")

~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in result(self)
    264 
    265     def result(self) -> R:
--> 266         r = self.results()
    267         assert not self._sub_jobs, "You should use `results()` if your job has subtasks."
    268         return r[0]

~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in results(self)
    287             return [tp.cast(R, sub_job.result()) for sub_job in self._sub_jobs]
    288 
--> 289         outcome, result = self._get_outcome_and_result()
    290         if outcome == "error":
    291             job_exception = self.exception()

~/anaconda3/envs/codeGen_env/lib/python3.6/site-packages/submitit/core/core.py in _get_outcome_and_result(self)
    382             else:
    383                 message.append(f"No output/error stream produced ! Check: {self.paths.stdout}")
--> 384             raise utils.UncompletedJobError("\n".join(message))
    385         try:
    386             output: tp.Tuple[str, tp.Any] = utils.pickle_load(self.paths.result_pickle)

UncompletedJobError: Job 18686 (task: 0) with path /home/sushantk/anaconda3/codeGen/data/python_test/log/18686_0_result.pkl
has not produced any output (state: FINISHED)
No output/error stream produced ! Check: /home/sushantk/anaconda3/codeGen/data/python_test/log/18686_0_log.out`

After opening the "python.test.dictionary.tok" "python.test.obfuscated.tok", "python.valid.dictionary.tok" "python.valid.obfuscated.tok" are blank, they are not producing anything.

Can you tell why this is happening??

sushantkumar007007 avatar May 04 '22 14:05 sushantkumar007007

Hi, It may be because all 35 examples in the python file you kept are sent to the training set. Maybe train running it on the 3 python files in the test dataset (it should still be quite fast) or increase --percent_test_valid to something like 10 or 20.

baptisteroziere avatar Jun 01 '22 16:06 baptisteroziere