Enrico Shippole

Results 37 issues of Enrico Shippole
trafficstars

Hi @lucidrains, Here are the results for training the GPT2 model on an A100 (40 GB). This is a different A100 I have not used before. I left everything the...

Hi Phil, I was wondering what your thoughts on adding Flash Attention 2 are? ```python n, device, h = x.shape[1], x.device, self.heads # pre layernorm x = self.norm(x) # attention...

Hello, Thank you for all of your great work. I am trying to just download and process the English dumps from CommonCrawl up to 2023. I have been running into...

Hi, Thank you for the great research. I am working on implementing the findings from this paper in a different setting using TRLX. Unfortunately, when matching hyperparameters for A2C with...

Hello, A peer of mine ran the benchmark script on an A100. Under what conditions should we see the most significant gain for the sparse 24 linear or activations? ```...

Ring Attention should work with Deepspeed Ulysses, correct? Are there any notable issues combining deepspeed's efficient sequence parallelism with such an attention mechanism? I do understand flash attention works. https://github.com/zhuzilin/ring-flash-attention

Hi, Is there a file of the list of repositories (repos.txt) available to use for recreating the results in the sourcegraph notebook? > Once we have initialized our database, we...

Hi, I have been trying to make some progress on the backward kernel for training. Unfortunately, I am new to GPU programming and triton so I may be missing parts....

triton

Hi @taki0112 , When running the Mobile ViT python file I receive an error. ```python v = MobileViT( image_size=(256, 256), dims=[96, 120, 144], channels=[16, 32, 48, 48, 64, 64, 80,...

Hi Phil, I have been working with @tomaarsen of HF and @haileyschoelkopf of EAI testing soft moe. One issue that was occurring was that the tensors were not contiguous: ```...