Firefly
Firefly copied to clipboard
qlora ft mixstral 8x7b error
Traceback (most recent call last):
File "./train_qlora.py", line 235, in find_unused_parameters=True
to torch.nn.parallel.DistributedDataParallel
, and by
making sure all forward
function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward
function. Please include the loss function and the structure of the return value of forward
of your module when reporting this issue (e.g. list, dict, iterable).
Hello, may I ask if you have solved this problem?
+1 also have this problem
+1 also have this problem
train.py中:改成 training_args.ddp_find_unused_parameters = True 就可以了
train.py中:改成 training_args.ddp_find_unused_parameters = True 就可以了
After setting training_args.ddp_find_unused_parameters = True, running into this error: RuntimeError: Expected to mark a variable ready only once This happened both in single GPU and multi GPUs.
It's said that setting ddp_find_unused_parameters=false to fix this. It seems like a bug. Can anyone solve it?