Enrico Shippole comments

Results 169 comments of


                                            Enrico Shippole

[Feature Request] Support `window_size`

Hi @zhuzilin, I appreciate the insight. One of the deepspeed maintainers stated, "Ulysses is, in principle, attention-type agnostic. Although we haven’t specifically tested Ulysses with Ring Attention, as long as...

Training with Hidet compiler

Hi @RameshArvind , To use `torch.compile` with einops for training you need to set: ```python from einops._torch_specific import allow_ops_in_compiled_graph # requires einops>=0.6.1 allow_ops_in_compiled_graph() ``` I will have to do further...

[Feature Request]

> [@conceptofmind](https://github.com/conceptofmind) Hi, why not directly using `torch.compile`? I think it would also lead to reasonable speedup. I am currently using `torch.compile` but did not know whether it made sense...

[Feature Request]

> [@conceptofmind](https://github.com/conceptofmind) In your case, this is necessary. `torch.compile` is still not that wise to avoid one additional activation (which can be recomputed in bwd in a cheap way). But...

[Feature Request]

Ok haha. I will exclude that then and just focus on the `gelu + linear`, `sqrelu + linear`, etc lol.

[Feature Request]

> [@conceptofmind](https://github.com/conceptofmind) Hi, I will close this issue since we have implemented most of your request, feel free to open a new PR if you have any issue. > >...

Was the BoS/EoS token used during pretraining?

Hi @grantdelozier , EOS was used during training. The EOS and PAD tokens being the same should not be an issue as you normally just add EOS tokens to the...

Apex Tensor Parallelism and LoRA

> Apex's Tensor Parallelism is not compatible with LoRA. LoRA is a library for training large language models on GPUs, while Apex is a library for extending PyTorch with new...

Will instruction fine tuned models be made available as well

@allthingssecurity All instruction-finetuned models on FLAN will be made publicly available as well.

Will instruction fine tuned models be made available as well

> Thanks for such a quick reply. When can we expect the same? The 2.1b model is training now. 2b won't be done for days. So after that finishes I...