DHS-LLM-Workshop train Segmentation fault

train Segmentation fault

Open bravelll opened this issue 1 year ago • 0 comments

python train.py
--model_path "bigcode/starcoderbase-1b"
--dataset_name "smangrul/hf-stack-v1"
--subset "data"
--data_column "content"
--split "train"
--seq_length 2048
--max_steps 2000
--batch_size 1
--gradient_accumulation_steps 1
--learning_rate 5e-5
--lr_scheduler_type "cosine"
--weight_decay 0.01
--num_warmup_steps 30
--eval_freq 100
--save_freq 500
--log_freq 25
--use_reentrant False
--num_workers 4
--bf16
--no_fp16
--output_dir "starcoderbase1b-personal-copilot-A100-40GB-colab"
--fim_rate 0.5
--fim_spm_rate 0.5
--use_flash_attn

error:/u01/liuys/anaconda3/envs/code/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( /u01/liuys/anaconda3/envs/code/lib/python3.10/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all ' Segmentation fault (core dumped)

Nov 21 '23 09:11 bravelll

DHS-LLM-Workshop DHS-LLM-Workshop copied to clipboard

train Segmentation fault

DHS-LLM-Workshop
DHS-LLM-Workshop copied to clipboard