FastChat
FastChat copied to clipboard
leave only 45 conversations in dummy.json result in error
at first we edit the dummy.json file, changed the "my name is Vicuna" as "my name is XXXXX", and keep all the other conversations (total 910) , then trained it, the new model works fine in English output, by failed when we asked it with other languages.
so in order to find out the problem, we made the same change and leave only the 45 conversations about "who are you" (delete other 865 conversations), then trained it. This time we faced below error message:
RuntimeError: The size of tensor a (32768512) must match the size of tensor b (262148096) at non-singleton dimension 0
all the other detail traceback is below. Any one can help?
Not sure whether this belongs to an issue, yet we could not find better place to resolve this problem.
2023-05-08 10:33:13.000 [INFO] [Driver] Traceback (most recent call last):
2023-05-08 10:33:13.000 [INFO] [Driver] File ""/home/xxxxxx/source/FastChat/fastchat/train/train_mem.py"", line 13, in __func__ and __self__ no longer │
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /optimizer.py:140 in wrapper │
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 137 │ │ │ │ obj, * = args │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 138 │ │ │ │ profile_name = ""Optimizer.step#{}.step"".format(obj._c │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 139 │ │ │ │ with torch.autograd.profiler.record_function(profile_n │
2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 140 │ │ │ │ │ out = func(*args, **kwargs) │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 141 │ │ │ │ │ obj.optimizer_step_code() │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 142 │ │ │ │ │ return out │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 143 │
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/autog │
2023-05-08 10:33:14.000 [INFO] [Driver] │ rad/grad_mode.py:27 in decorate_context │
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 24 │ │ @functools.wraps(func) │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 25 │ │ def decorate_context(*args, **kwargs): │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 26 │ │ │ with self.clone(): │
2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 27 │ │ │ │ return func(*args, **kwargs) │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 28 │ │ return cast(F, decorate_context) │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 29 │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 30 │ def wrap_generator(self, func): │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /adamw.py:162 in step │
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 159 │ │ │ │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 160 │ │ │ │ state_steps.append(state['step']) │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 161 │ │ │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 162 │ │ │ adamw(params_with_grad, │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 163 │ │ │ │ grads, │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 164 │ │ │ │ exp_avgs, │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 165 │ │ │ │ exp_avg_sqs, │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /adamw.py:219 in adamw │
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 216 │ else: │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 217 │ │ func = single_tensor_adamw │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 218 │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 219 │ func(params, │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 220 │ │ grads, │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 221 │ │ exp_avgs, │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 222 │ │ exp_avg_sqs, │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /home/xxxxxx/miniconda3/envs/fschat/lib/python3.10/site-packages/torch/optim │
2023-05-08 10:33:14.000 [INFO] [Driver] │ /adamw.py:273 in single_tensor_adamw │
2023-05-08 10:33:14.000 [INFO] [Driver] │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 270 │ │ param.mul(1 - lr * weight_decay) │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 271 │ │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 272 │ │ # Decay the first and second moment running average coefficien │
2023-05-08 10:33:14.000 [INFO] [Driver] │ ❱ 273 │ │ exp_avg.mul(beta1).add(grad, alpha=1 - beta1) │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 274 │ │ exp_avg_sq.mul(beta2).addcmul(grad, grad, value=1 - beta2) │"
2023-05-08 10:33:14.000 [INFO] [Driver] │ 275 │ │ │
2023-05-08 10:33:14.000 [INFO] [Driver] │ 276 │ │ if capturable: │
2023-05-08 10:33:14.000 [INFO] [Driver] ╰──────────────────────────────────────────────────────────────────────────────╯
2023-05-08 10:33:14.000 [INFO] [Driver] RuntimeError: The size of tensor a (32768512) must match the size of tensor b
2023-05-08 10:33:14.000 [INFO] [Driver] (262148096) at non-singleton dimension 0
update: both pop same error in 0.2.3 and 0.2.5
Do you have gradient accumulation steps larger than your dataset size?
Do you have gradient accumulation steps larger than your dataset size?
not quite sure about this. In my case, I changed nothing but the dummy.json file. Seems there is a minimum conversation count required, after testing, we found it's about 100. really wired.
oh, I met the same problem before. But I found it's because I use a small dataset and set a big accumulate step larger than dataset size. It become normal after I change the accumulate step. Maybe your conditional is similar. Hope this can inspire you!
thank you, I am new to LLM. I basically understand your point, I'll try and see what's coming