dan_the_3rd comments

Results 235 comments of


                                            dan_the_3rd

Does xformers support FP8?

This is something that will eventually come into xFormers, but not in the very short term. Also I'm curious if you have any data to share regarding numerics for fp8...

No support for pythorch 2.3.1+rocm6

Hi, We don't have builds for AMD at the moment. Might come in the future, but can't promise anything (or any ETA) at this point.

When will xformers support Flash Attention 3?

Hi, We're following closely what's happening. The current implementation has some bugs which we reported, but when it's ready it will be integrated :) https://github.com/Dao-AILab/flash-attention/issues/1052

When will xformers support Flash Attention 3?

Indeed :) We're working on this, hopefully we can have it in xFormers during next week as an experimental feature

When will xformers support Flash Attention 3?

This is taking a bit more time than expected. Hopefully we will have it by next week but not sure.

Device error on 0.0.27.dev844

Hi, Thanks for reporting this bug. Which attention bias are you using? (`type(attn_bias)`) There are 2 solutions to unblock you immediately: (1) Set the correct cuda device for each rank...

Device error on 0.0.27.dev844

> maybe I can set the default coda device Yes this would most likely solve your issue

Does Xformers offer any extra speed over PyTorch anymore? And why is my Xformers file so big?

Hi, What GPU do you have? How did you build xFormers? And what is your benchmark for measuring speed? Some of the components from xFormers have been integrated in PyTorch,...

Does Xformers offer any extra speed over PyTorch anymore? And why is my Xformers file so big?

So for the Titan kernels, we didn't change them in a while, and they are available as part of PyTorch now. You will get exactly the same speed/result with PyTorch's...

Does Xformers offer any extra speed over PyTorch anymore? And why is my Xformers file so big?

Updating this - we added support for Flash3 by default in xFormers. This is not yet supported in PyTorch's `scaled_dot_product_attention`, so we expect xFormers to be quite faster on H100s,...