dan_the_3rd comments

Results 190 comments of


                                            dan_the_3rd

Compiling from source fails to build extensions/is not usable for torch nightly 2.4.0 + cuda 12.4

Unfortunately I didn't manage to repro (Linux, Python 3.10, torch installed via pip for cu124). Not sure what's the difference in setup there...

[BUG] Release 3.5.0 build failing on Windows using CUDA 12.6, and VS2022 17.11

Hi, Just wanted to follow-up on this. This is blocking the build of some components of xFormers on Windows - is there a way to do a workaround in the...

`ScaledDotProduct` with attention mask returns different result as standard attention

cc @fmassa

why conflicting release notes?

woops that's a typo. We removed conda binaries for 3.9, and added 3.11 instead

python3 -m xformers.info got AttributeError: 'NoneType' object has no attribute 'start'

cc @bottler @sgrigory maybe this initialization can be done lazily?

python3 -m xformers.info got AttributeError: 'NoneType' object has no attribute 'start'

As a workaround, and if you don't need any triton kernel from xformers, you can try setting this env variable: `XFORMERS_FORCE_DISABLE_TRITON=1`

What is the difference between the 4 implementations of FMHA?

Hi, We don't have documentation for this - as we consider these backends internal details that we would rather not expose publicly (because they can change). But as of today:...

What is the difference between the 4 implementations of FMHA?

What GPU are you using? Flash-Decoding is supported on `xformers.ops.fmha.flash.FwOp` and [`split_k`](https://github.com/facebookresearch/xformers/blob/main/xformers/ops/fmha/triton_splitk.py) and both require A100 or newer GPU

What is the difference between the 4 implementations of FMHA?

Yes

`fmha.cutlass.FwOp` is 2x slower than `fmha.flash.FwOp`

They are indeed the same mathematic algorithm (in terms of mathematical operations), but the way work is parallelized and scheduled is a bit different. Plus the implementation details matter a...