Preprocess with debug gives error.
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Preprocess with debug should work but gives error:
without --debug it works.
Using below dataset config.
datasets:
- path: /content/mar_orca_dataset.json type: alpaca_w_system.load_open_orca ds_type: json dataset_prepared_path: /content dataset_processes: 2
Current behaviour
Preprocess with debug should work but gives error:
**** Axolotl Dependency Versions *****
accelerate: 0.28.0
peft: 0.10.0
transformers: 4.40.0.dev0
trl: 0.8.5
torch: 2.1.2
bitsandbytes: 0.43.0
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/src/axolotl/src/axolotl/cli/preprocess.py", line 70, in
Steps to reproduce
Run preprocess with debug option and error is seen.
Config yaml
base_model: meta-llama/Meta-Llama-3-8B-Instruct
#model_type: AutoModelForCausalLM #For Gemma
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
#datasets:
# - path: /content/test_txt_data-10exmpl.json
# type: completion
# field: text
#datasets:
# - path: ./mar_alpaca_dataset.json
# type: alpaca
# ds_type: json
datasets:
- path: /content/mar_orca_dataset.json
type: alpaca_w_system.load_open_orca
ds_type: json
dataset_prepared_path: /content
dataset_processes: 2
val_set_size: 0
output_dir: ./qlora-out
adapter: qlora
lora_model_dir:
sequence_len: 700
sample_packing: true
pad_to_sequence_len: true
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj
#lora_modules_to_save:
#- embed_tokens
#- lm_head
lora_target_linear: true
lora_fan_in_fan_out:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: false
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: False
warmup_ratio: 0.1
evals_per_epoch: 1
eval_table_size:
eval_max_new_tokens: 128
eval_sample_packing: False
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
pad_token: <|end_of_text|>
save_safetensors: True
gpu_memory_limit: 14
Possible solution
There shouldnt be an error
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.10
axolotl branch-commit
latest
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Hi @amitagh, what is the exact command you used with --debug? make sure to use --debug after you set the YAML file argument.
correct: python -m axolotl.cli.preprocess path/to/your.yaml --debug
incorrect: python -m axolotl.cli.preprocess --debug path/to/your.yaml
You are correct. After placing --debug after yml file it works.
<|begin_of_text|>(-100, 128000) ###(-100, 14711) System(-100, 744) : (-100, 512) You(-100, 2675) are(-100, 527) an(-100, 459) AI(-100, 15592) assistant(-100, 18328) .(-100, 13) You(-100, 1472) will(-100, 690) be(-100, 387) given(-100, 2728) a(-100, 264) task(-100, 3465) .(-100, 13) You(-100, 1472) must(-100, 2011) generate(-100, 7068) a(-100, 264) detailed(-100, 11944) and(-100, 323) long(-100, 1317) answer(-100, 4320) . (-100, 627) ###(-100, 14711) Human(-100, 11344) : (-100, 512) Generate(-100, 32215) an(-100, 459) approximately(-100, 13489) fifteen(-100, 37755) -word(-100, 38428) sentence(-100, 11914) that(-100, 430) describes(-100, 16964) all(-100, 682) this(-100, 420) data(-100, 828) :(-100, 25) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) eat(-100, 8343) Type(-100, 941) restaurant(-100, 10960) ;(-100, 26) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) food(-100, 3691) Chinese(-100, 8620) ;(-100, 26) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) price(-100, 3430) Range(-100, 6174) moderate(-100, 24070) ;(-100, 26) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) customer(-100, 6130) rating(-100, 10959) (-100, 220) 3(-100, 18) out(-100, 704) of(-100, 315) (-100, 220) 5(-100, 20) ;(-100, 26) Mid(-100, 14013) summer(-100, 63666) House(-100, 4783) near(-100, 3221) All(-100, 2052) Bar(-100, 4821) One(-100, 3861) (-100, 198) ###(-100, 14711) Assistant(-100, 22103) : (-100, 512) Mid(34748, 34748) summer(63666, 63666) House(4783, 4783) is(374, 374) a(264, 264) moderately(70351, 70351) priced(33705, 33705) Chinese(8620, 8620) restaurant(10960, 10960) with(449, 449) a(264, 264) (220, 220) 3(18, 18) /(14, 14) 5(20, 20) customer(6130, 6130) rating(10959, 10959) ,(11, 11) located(7559, 7559) near(3221, 3221) All(2052, 2052) Bar(4821, 4821) One(3861, 3861) .(13, 13) <|end_of_text|>(128001, 128001)
Above is the output it generated which seems correct as per orca format.
Thanks, Amit.
On Thu, May 9, 2024 at 1:13 AM Wing Lian @.***> wrote:
Hi @amitagh https://github.com/amitagh, what is the exact command you used with --debug? make sure to use --debug after you set the YAML file argument.
correct: python -m axolotl.cli.preprocess path/to/your.yaml --debug
incorrect: python -m axolotl.cli.preprocess --debug path/to/your.yaml
— Reply to this email directly, view it on GitHub https://github.com/OpenAccess-AI-Collective/axolotl/issues/1599#issuecomment-2101303619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASHD4BHRUSIEENG5CARRPCLZBJ54XAVCNFSM6AAAAABHLGE2P2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBRGMYDGNRRHE . You are receiving this because you were mentioned.Message ID: @.***>