Niklas comments

Results 213 comments of


                                            Niklas

Reshape ZeroStage=0 FP16 Checkpoint

> @Muennighoff, thanks for your question. Can you please clarify a bit more because `zero_stage=0` actually disables ZeRO and is pure DDP. The only reshaping needs that I can imagine...

Reshape ZeroStage=0 FP16 Checkpoint

Yes, I need continue training in the new shape, so I think I will also need to reshape the optimizer states. I will continue training with zero stage 1, however....

PyTorch implementation as a class

Very nice work, running into "RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time."...

Update BLOOM parameter counts

I think it's fine as old links still work ``` New: Automatic Redirection All links to this model will automatically redirect to the new location, including git operations. However, to...

Update BLOOM parameter counts

Note that the spaces will probably still break; As e.g. `AutoTokenizer.from_pretrained("bigscience/bloom-350m")` no longer works

Update BLOOM parameter counts

> I think it's fine as old links still work > > ``` > New: Automatic Redirection > All links to this model will automatically redirect to the new location,...

Update BLOOM parameter counts

Let's merge this? I think the damage is done & reverting now would just cause more damage. I will communicate such a change more extensively next time, sorry for the...

Add t0 scripts

> If you can actually use the validation data from T0 then I'd say this is better. For that either a) Add a new arg like `args.data_path` that calls build_train_valid_test_datasets...

Interactive generation script

I can't find any documentation on `max_cpu_memory` - Does this kwarg exist? ```bash Traceback (most recent call last): File "generate.py", line 64, in main() File "generate.py", line 41, in main...

Interactive generation script

Also I'm pretty sure `max_memory` cannot be a string, but has to be a dictionary