Enrico Shippole
Enrico Shippole
@lucidrains Here are the runs for PaLM **with** flash-cosine-sim-attention. - 6.52s/it - Sequence Length 8192 - fp32 For 14k steps:  And for the whole training run:...
@lucidrains Here is the code for a ViT-16 **with** flash-cosine-sim-attention: ```python import torch from torch import nn from einops import rearrange, repeat from einops.layers.torch import Rearrange from flash_cosine_sim_attention import flash_cosine_sim_attention...
@lucidrains Here are the results for training the ViT-16 **with** flash-cosine-sim-attention on CIFAR10 for 100 epochs. Train and Validation loss:  Train and Validation accuracy: ![Screenshot from...
@lucidrains Of course. I will train a ViT-16 with regular attention on CIFAR10 for 100 epochs and compare the curves side by side now. I will update you when everything...
@lucidrains Here are the results for the ViT-16 experiments with and without flash-cosine-sim attention. For regular attention I used a learning rate of 2e-4. For flash-cosine-sim I tested with a...
@lucidrains Here are the results for ViT-16 with and without flash-cosine-sim attention on CIFAR10 for 100 epochs with the same learning rate of 2e-4. I am using an A100(40 GB)....
Firstly, it costs ~$20,000 to $80,000 to train a small foundational model. But that is the plan if I can get funding. Although I need to finish collecting both datasets...
> You means that "it is forbidden by the Terms of Service (ToS) of OpenAI ChatGPT". Thank you for your response to this issue. > > Maybe the open source...
There currently is no way to `pip install`. This will be added in the future. You would have to `git clone https://github.com/conceptofmind/LaMDA-rlhf-pytorch.git`. Then `cd` into the directory LaMDA-rlhf-pytorch. From there...
Hi @samadejacobs, I appreciate the insight. I will have to test both of them in conjunction together and let you know. Thank you, Enrico