xtuner
xtuner copied to clipboard
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Traceback (most recent call last): File "/home/wumao/xtuner-main/xtuner/tools/model_converters/pth_to_hf.py", line 158, in main() File "/home/wumao/xtuner-main/xtuner/tools/model_converters/pth_to_hf.py", line 78, in main model = BUILDER.build(cfg.model) File "/home/wumao/miniconda3/envs/xtuner-env/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, **kwargs,...
log输出: ``` eta: 0:00:03 time: 0.0768 data_time: 0.0137 memory: 9806 ``` 其中time & data_time 分别是什么意思?是否包含梯度回传的时间? memory的单位是什么? 谢谢
我在训练时输出以下内容后,程序就停止了,请问这种情况该如何解决? `2024-05-15 09:29:44.939294: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-05-15 09:29:44.939347: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register...
修改tensorboard的日志目录后,当微调step执行到save_step时,报错FileNotFoundError: [Errno 2] No such file or directory: '/app/work_dirs/chatglm2_6b_qlora_lawyer_e3_copy/20240514_035914/vis_data/eval_outputs_iter_499.txt'. debug发现是在到达save_step后,xtuner调用evaluate_chat_hook.py的_save_eval_output方法: ```python def _save_eval_output(self, runner, eval_outputs): save_path = os.path.join(runner.log_dir, 'vis_data', f'eval_outputs_iter_{runner.iter}.txt') with open(save_path, 'w', encoding='utf-8') as f: for i, output in...
看README的图表是可以训练的,但是我一直OOM
如图,原本有3w多样本, 最后就只有4k多,该如何定位该问题? 脚本如下: [rm -rf llama3_finetune_pth/* output_dir=llama3_finetune_pth config_py=xtuner/configs/llama/llama3_8b_instruct/llama3_8b_instruct_qlora_alpaca_e3.py CUDA_VISIBLE_DEVICES=0,1 NPROC_PER_NODE=2 xtuner train ${config_py} --work-dir ${output_dir} --deepspeed deepspeed_zero2 --seed 1024](url)
Traceback (most recent call last): File "/root/autodl-tmp/xtuner/xtuner/tools/model_converters/pth_to_hf.py", line 168, in main() File "/root/autodl-tmp/xtuner/xtuner/tools/model_converters/pth_to_hf.py", line 81, in main model = BUILDER.build(cfg.model) File "/root/miniconda3/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build return self.build_func(cfg, *args, **kwargs,...
目前有华为NPU卡可以用来训练测试,不清楚需要设置哪些参数?