Enrico Shippole

Results 169 comments of Enrico Shippole

Hi @zhuzilin, I appreciate the insight. One of the deepspeed maintainers stated, "Ulysses is, in principle, attention-type agnostic. Although we haven’t specifically tested Ulysses with Ring Attention, as long as...

Hi @RameshArvind , To use `torch.compile` with einops for training you need to set: ```python from einops._torch_specific import allow_ops_in_compiled_graph # requires einops>=0.6.1 allow_ops_in_compiled_graph() ``` I will have to do further...

> [@conceptofmind](https://github.com/conceptofmind) Hi, why not directly using `torch.compile`? I think it would also lead to reasonable speedup. I am currently using `torch.compile` but did not know whether it made sense...

> [@conceptofmind](https://github.com/conceptofmind) In your case, this is necessary. `torch.compile` is still not that wise to avoid one additional activation (which can be recomputed in bwd in a cheap way). But...

Ok haha. I will exclude that then and just focus on the `gelu + linear`, `sqrelu + linear`, etc lol.

> [@conceptofmind](https://github.com/conceptofmind) Hi, I will close this issue since we have implemented most of your request, feel free to open a new PR if you have any issue. > >...

Hi @grantdelozier , EOS was used during training. The EOS and PAD tokens being the same should not be an issue as you normally just add EOS tokens to the...

> Apex's Tensor Parallelism is not compatible with LoRA. LoRA is a library for training large language models on GPUs, while Apex is a library for extending PyTorch with new...

@allthingssecurity All instruction-finetuned models on FLAN will be made publicly available as well.

> Thanks for such a quick reply. When can we expect the same? The 2.1b model is training now. 2b won't be done for days. So after that finishes I...