[Help] Fail to fine tune Qwen3-4B-Instruct-2507 with Lora using llama-factory v0.9.4.dev0
Reminder
- [x] I have read the above rules and searched the existing issues.
System Info
I use Llama-factory v0.9.4.dev0 to fine tune Qwen3-4B-Instruct-2507 with Lora, but failed with the below log. I did the same training about three weeks before and succeeded.
Error log as below:
W1113 09:26:42.977000 46121 torch/distributed/run.py:774]
W1113 09:26:42.977000 46121 torch/distributed/run.py:774] *****************************************
W1113 09:26:42.977000 46121 torch/distributed/run.py:774] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W1113 09:26:42.977000 46121 torch/distributed/run.py:774] *****************************************
/home/test/.local/lib/python3.10/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/home/test/.local/lib/python3.10/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
/home/test/.local/lib/python3.10/site-packages/jieba/_compat.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
import pkg_resources
Traceback (most recent call last):
File "/home/test/.local/lib/python3.10/site-packages/transformers/hf_argparser.py", line 258, in _add_dataclass_arguments
Traceback (most recent call last):
File "/home/test/.local/lib/python3.10/site-packages/transformers/hf_argparser.py", line 258, in _add_dataclass_arguments
type_hints: dict[str, type] = get_type_hints(dtype)
File "/usr/lib/python3.10/typing.py", line 1833, in get_type_hints
type_hints: dict[str, type] = get_type_hints(dtype)
File "/usr/lib/python3.10/typing.py", line 1833, in get_type_hints
value = _eval_type(value, base_globals, base_locals)
File "/usr/lib/python3.10/typing.py", line 329, in _eval_type
ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.args)
File "/usr/lib/python3.10/typing.py", line 329, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/test/LLaMA-Factory/src/llamafactory/launcher.py", line 180, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/test/LLaMA-Factory/src/llamafactory/launcher.py", line 180, in from __future__ import annotations which opts in Postponed Evaluation of Annotations (PEP 563)
raise RuntimeError(
RuntimeError: Type resolution failed for <class 'llamafactory.hparams.training_args.TrainingArguments'>. Try declaring the class in global scope or removing line of from __future__ import annotations which opts in Postponed Evaluation of Annotations (PEP 563)
Traceback (most recent call last):
File "/home/test/.local/lib/python3.10/site-packages/transformers/hf_argparser.py", line 258, in _add_dataclass_arguments
type_hints: dict[str, type] = get_type_hints(dtype)
File "/usr/lib/python3.10/typing.py", line 1833, in get_type_hints
value = _eval_type(value, base_globals, base_locals)
File "/usr/lib/python3.10/typing.py", line 329, in _eval_type
ev_args = tuple(_eval_type(a, globalns, localns, recursive_guard) for a in t.args)
File "/usr/lib/python3.10/typing.py", line 329, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/test/LLaMA-Factory/src/llamafactory/launcher.py", line 180, in
run_exp()
File "/home/test/LLaMA-Factory/src/llamafactory/train/tuner.py", line 110, in run_exp
_training_function(config={"args": args, "callbacks": callbacks})
File "/home/test/LLaMA-Factory/src/llamafactory/train/tuner.py", line 55, in _training_function
model_args, data_args, training_args, finetuning_args, generating_args = get_train_args(args)
File "/home/test/LLaMA-Factory/src/llamafactory/hparams/parser.py", line 219, in get_train_args
model_args, data_args, training_args, finetuning_args, generating_args = _parse_train_args(args)
File "/home/test/LLaMA-Factory/src/llamafactory/hparams/parser.py", line 195, in _parse_train_args
parser = HfArgumentParser(_TRAIN_ARGS)
File "/home/test/.local/lib/python3.10/site-packages/transformers/hf_argparser.py", line 143, in init
self._add_dataclass_arguments(dtype)
File "/home/test/.local/lib/python3.10/site-packages/transformers/hf_argparser.py", line 260, in _add_dataclass_arguments
raise RuntimeError(
RuntimeError: Type resolution failed for <class 'llamafactory.hparams.training_args.TrainingArguments'>. Try declaring the class in global scope or removing line of from __future__ import annotations which opts in Postponed Evaluation of Annotations (PEP 563)
W1113 09:26:48.088000 46121 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 46193 closing signal SIGTERM
E1113 09:26:48.152000 46121 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 46191) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/test/.local/bin/torchrun", line 8, in
sys.exit(main())
File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 357, in wrapper
return f(*args, **kwargs)
File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main
run(args)
File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
elastic_launch(
File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 143, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 277, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
from __future__ import annotations which opts in Postponed Evaluation of Annotations (PEP 563)
W1113 09:26:48.088000 46121 torch/distributed/elastic/multiprocessing/api.py:900] Sending process 46193 closing signal SIGTERM
E1113 09:26:48.152000 46121 torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 46191) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/home/test/.local/bin/torchrun", line 8, in /home/test/LLaMA-Factory/src/llamafactory/launcher.py FAILED
Failures: [1]: time : 2025-11-13_09:26:48 host : test-test-Product rank : 1 (local_rank: 1) exitcode : 1 (pid: 46192) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure): [0]: time : 2025-11-13_09:26:48 host : test-test-Product rank : 0 (local_rank: 0) exitcode : 1 (pid: 46191) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Traceback (most recent call last):
File "/home/test/.local/bin/llamafactory-cli", line 8, in
Reproduction
Put your message here.
Others
No response