Dirk Groeneveld

Results 84 issues of Dirk Groeneveld

- [x] Mitch (we already ran this anyways) - [x] Our own - [x] Kaiming We don't understand kaiming very well and it's not implemented yet, so this one is...

project/model
compute/Mosaic

The theory is that the second moment goes to zero, resulting in a big update, which results in a loss spike. - [x] Generate some checkpoints closer to the spike...

project/model

The PaLM paper has a short section of tweaks to the vanilla Transformer architecture. We should make sure we have all of those.

project/model
severity/should

One experiment is, let's just keep running the 7B and see if it recovers from the spikes.

project/model
compute/IB

https://github.com/allenai/LLM/blob/2118db56095157474fe1c69c1702db08af2d4f74/scripts/train.py#L187 I think having a checkpoint before any training happens would be quite useful.

We can't run in a debugger anymore.

project/model
severity/should
difficulty/medium

Activation checkpointing needs to keep track of the state of the random number generator, which fails with `torch.compile()`. Rumor has it that the latest torch nightly has this fixed, so...

Maybe wandb will take care of this for us? I opened a ticket with them.

project/model
severity/should
status/blocked

There are some checkpoints that we want to keep forever, because they are part of our output. The checkpoint saving code needs to know about those.

project/model
severity/must
difficulty/medium

project/model
severity/must
difficulty/medium