Enrico Shippole
Enrico Shippole
Hi @zhuzilin, I appreciate the insight. One of the deepspeed maintainers stated, "Ulysses is, in principle, attention-type agnostic. Although we haven’t specifically tested Ulysses with Ring Attention, as long as...
Hi @RameshArvind , To use `torch.compile` with einops for training you need to set: ```python from einops._torch_specific import allow_ops_in_compiled_graph # requires einops>=0.6.1 allow_ops_in_compiled_graph() ``` I will have to do further...
> [@conceptofmind](https://github.com/conceptofmind) Hi, why not directly using `torch.compile`? I think it would also lead to reasonable speedup. I am currently using `torch.compile` but did not know whether it made sense...
> [@conceptofmind](https://github.com/conceptofmind) In your case, this is necessary. `torch.compile` is still not that wise to avoid one additional activation (which can be recomputed in bwd in a cheap way). But...
Ok haha. I will exclude that then and just focus on the `gelu + linear`, `sqrelu + linear`, etc lol.
> [@conceptofmind](https://github.com/conceptofmind) Hi, I will close this issue since we have implemented most of your request, feel free to open a new PR if you have any issue. > >...
Hi @grantdelozier , EOS was used during training. The EOS and PAD tokens being the same should not be an issue as you normally just add EOS tokens to the...
> Apex's Tensor Parallelism is not compatible with LoRA. LoRA is a library for training large language models on GPUs, while Apex is a library for extending PyTorch with new...
@allthingssecurity All instruction-finetuned models on FLAN will be made publicly available as well.
> Thanks for such a quick reply. When can we expect the same? The 2.1b model is training now. 2b won't be done for days. So after that finishes I...