Dirk Groeneveld issues

Results 84 issues of


                                            Dirk Groeneveld

Kebab7

This is the `kebab` config, a smaller version of the `dirk` config. Differences from `dirk`: * untied weights * weight decay on everything * adjusted `mlp_hidden_size` so we come out...

Llama config with a default layer norm instead of RMS for performance

boto3 client creation is not thread safe

https://github.com/boto/boto3/issues/801 We can probably just wrap our client creation into a mutex? The problem is in util.py, line 520.

Stable AdamW

Linear decay instead of cosine

Shuffle documents within a batch

Improve the CPU unsharder

Our checkpointing mechanism works by running `torch.save()` on each rank. This creates, for each rank, a surprisingly large file that is a pickled state dict. A lot of the tensors...

Compiling the AMD layer norm

This is running with slurm run id 4581738.

logit scaling

Try longer warmup (5k) at the 7B scale with mitch init, normal init and fan-in init

This should be a very easy thing to try. Just one setting in the config.

project/model

compute/Mosaic