RedPajama-Data
RedPajama-Data copied to clipboard
Unavailable Parameters
Hi! I am trying to download the crawl split 2023-50. I am running the command python -m cc_net --dump 2023-50, which raises the following error:
Will run cc_net.mine.main with the following config: Config(config_name='base', dump='2023-50', output_dir=PosixPath('data'), mined_dir='mined', execution='auto', num_shards=1600, min_shard=-1, num_segments_per_shard=-1, metadata=None, min_len=300, hash_in_mem=50, lang_whitelist=[], lang_blacklist=[], lang_threshold=0.5, keep_bucket=[], lm_dir=PosixPath('data/lm_sp'), cutoff=PosixPath('/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/data/cutoff.csv'), lm_languages=None, mine_num_processes=16, target_size='4G', cleanup_after_regroup=False, task_parallelism=-1, pipeline=['dedup', 'lid', 'keep_lang', 'sp', 'lm', 'pp_bucket', 'drop', 'split_by_lang'], experiments=[], cache_dir=None)
Submitting _hashes_shard in a job array (1600 jobs)
Traceback (most recent call last):
File "/n/sw/Mambaforge-23.3.1-1/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/n/sw/Mambaforge-23.3.1-1/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/__main__.py", line 18, in <module>
main()
File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/__main__.py", line 14, in main
func_argparse.parse_and_call(cc_net.mine.get_main_parser())
File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/func_argparse/__init__.py", line 72, in parse_and_call
return command(**parsed_args)
File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 638, in main
all_files = mine(conf)
File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 340, in mine
hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem))
File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/mine.py", line 265, in hashes
ex(_hashes_shard, repeat(conf), *_transpose(missing_outputs))
File "/n/home06/zhentingqi/RedPajama-Data/data_prep/cc/cc_net/cc_net/execution.py", line 106, in map_array_and_wait
jobs = ex.map_array(function, *args)
File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/core/core.py", line 771, in map_array
return self._internal_process_submissions(submissions)
File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/auto/auto.py", line 218, in _internal_process_submissions
return self._executor._internal_process_submissions(delayed_submissions)
File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/slurm/slurm.py", line 332, in _internal_process_submissions
array_ex.update_parameters(**self.parameters)
File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/core/core.py", line 810, in update_parameters
self._internal_update_parameters(**kwargs)
File "/n/home06/zhentingqi/.local/lib/python3.10/site-packages/submitit/slurm/slurm.py", line 306, in _internal_update_parameters
raise ValueError(
ValueError: Unavailable parameter(s): ['slurm_time']
Valid parameters are:
- account (default: None)
- additional_parameters (default: None)
- array_parallelism (default: 256)
- comment (default: None)
- constraint (default: None)
- cpus_per_gpu (default: None)
- cpus_per_task (default: None)
- dependency (default: None)
- exclude (default: None)
- exclusive (default: None)
- gpus_per_node (default: None)
- gpus_per_task (default: None)
- gres (default: None)
- job_name (default: 'submitit')
- mail_type (default: None)
- mail_user (default: None)
- mem (default: None)
- mem_per_cpu (default: None)
- mem_per_gpu (default: None)
- nodelist (default: None)
- nodes (default: 1)
- ntasks_per_node (default: None)
- num_gpus (default: None)
- partition (default: None)
- qos (default: None)
- setup (default: None)
- signal_delay_s (default: 90)
- srun_args (default: None)
- stderr_to_stdout (default: False)
- time (default: 5)
- use_srun (default: True)
- wckey (default: 'submitit')
Can someone please help me solve the problem? Thanks!