Amit Moryossef

Results 279 comments of Amit Moryossef

The problem is with small images. Here's a `vit-pytorch` only implementation: ```py import time import statistics import torch from vit_pytorch.vit import ViT from vit_pytorch.na_vit import NaViT as NaViT_orig from vit_pytorch.na_vit_nested_tensor...

Now that #353 is merged. Here's a better benchmark script with more images (512 variable-width images (16px tall, 32-80px wide)) ```py import time import statistics import random import torch from...

With #354 merged ``` pip install "vit-pytorch==1.16.3" ``` | Model | Time per batch | vs ViT | |-----------------|----------------|----------------| | ViT (padded) | 6.1±0.1ms | 1x (baseline) | | NaViT...

I agree, we should at the very least update support to 3.12

@AI-Guru I have the same problem. Did you manage to convert lakh dataset to tfrecord? If so, could you please share how?

Thanks @mhoangvslev , however, my attention mask is `[batch_size, 1, max_seqlen, max_seqlen]`

Hi @tridao - any progress on this? sorry, I am not technical enough to understand all the low level stuff here...

I had Claude run a benchmark. `flex_attention` works with 4D masks. Everything ran on an NVIDIA DGX Spark ---- ● Results: Batch=128, Seq=128, Realistic Masks | Implementation | pythia-14m 2D...