Enrico Shippole issues

Results 37 issues of


                                            Enrico Shippole

trafficstars

Training Loss and Experiments

Hi @lucidrains, Here are the results for training the GPT2 model on an A100 (40 GB). This is a different A100 I have not used before. I left everything the...

Flash Attention 2

Hi Phil, I was wondering what your thoughts on adding Flash Attention 2 are? ```python n, device, h = x.shape[1], x.device, self.heads # pre layernorm x = self.norm(x) # attention...

Numerous Errors

Hello, Thank you for all of your great work. I am trying to just download and process the English dumps from CommonCrawl up to 2023. I have been running into...

Instability when training

Hi, Thank you for the great research. I am working on implementing the findings from this paper in a different setting using TRLX. Unfortunately, when matching hyperparameters for A2C with...

Sparse 24 Linear

Hello, A peer of mine ran the benchmark script on an A100. Under what conditions should we see the most significant gain for the sparse 24 linear or activations? ```...

Ring Attention should work with Deepspeed Ulysses, correct? Are there any notable issues combining deepspeed's efficient sequence parallelism with such an attention mechanism? I do understand flash attention works. https://github.com/zhuzilin/ring-flash-attention

Enrico Shippole

Training Loss and Experiments

Flash Attention 2

Numerous Errors

Instability when training

Sparse 24 Linear

Deepspeed Ulysses

repos.txt file

Backward Kernel Implementation

Mobile ViT

Not receiving grads with cpu_besides?