Niklas
Niklas
> @Muennighoff, thanks for your question. Can you please clarify a bit more because `zero_stage=0` actually disables ZeRO and is pure DDP. The only reshaping needs that I can imagine...
Yes, I need continue training in the new shape, so I think I will also need to reshape the optimizer states. I will continue training with zero stage 1, however....
Very nice work, running into "RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time."...
I think it's fine as old links still work ``` New: Automatic Redirection All links to this model will automatically redirect to the new location, including git operations. However, to...
Note that the spaces will probably still break; As e.g. `AutoTokenizer.from_pretrained("bigscience/bloom-350m")` no longer works
> I think it's fine as old links still work > > ``` > New: Automatic Redirection > All links to this model will automatically redirect to the new location,...
Let's merge this? I think the damage is done & reverting now would just cause more damage. I will communicate such a change more extensively next time, sorry for the...
> If you can actually use the validation data from T0 then I'd say this is better. For that either a) Add a new arg like `args.data_path` that calls build_train_valid_test_datasets...
I can't find any documentation on `max_cpu_memory` - Does this kwarg exist? ```bash Traceback (most recent call last): File "generate.py", line 64, in main() File "generate.py", line 41, in main...
Also I'm pretty sure `max_memory` cannot be a string, but has to be a dictionary