Iz Beltagy

Results 38 comments of Iz Beltagy

Can you try this for regular BERT and see if you get the same pattern?

As I said, I don't think this is a bug, it is just how the model decided to represent your tokens. As to the similarity measures, maybe normalizing the vector...

1- The official package is better if it has gradient accumulation (they have an open PR for it https://github.com/allenai/allennlp/pull/3051) 2- What do you mean by regular dependency?

[Here](https://huggingface.co/bigscience/gpt2-350m-en/tree/megatron-deepspeed)'s a megatron-deepspeed checkpoint and [here](https://huggingface.co/bigscience/gpt2-350m-en/tree/main)'s the corresponding HF-transformer checkpoint. We just need to verify that these two are the same.

Yes, to run Meg-DS training. Basically doing the steps listed in readme here https://github.com/bigscience-workshop/Megatron-DeepSpeed for them so that they only need to run the `pretrain_*` script.

@jaketae can be the first user of the AMI

Dirk's config is this branch https://github.com/allenai/LLM/tree/DirksRun2