RedPajama-Data
RedPajama-Data copied to clipboard
Step 2) "Invalid option: ---input_base_uri"
bash scripts/apptainer_run_quality_signals.sh \
--config configs/rp_v2.0.conf \
--dump_id "2022-49" \
--input_base_uri "file:///path/to/data/root" \
--output_base_uri "file:///path/to/outout/data/root" \
--max_docs -1
Invalid option: ---input_base_uri Usage: apptainer_run_quality_signals.sh [ -c | --config ] [ -d | --dump_id ]
good catch, thanks for reporting! The three flags --input_base_uri
, --output_base_uri
and --max_docs
are actually set in the config file: https://github.com/togethercomputer/RedPajama-Data/blob/bb594b01a92b7e6fcf70cf3b6659851ce17edcce/configs/rp_v2.0.conf#L4-L6
You can just drop them in the call to the apptainer script.