Tri Dao comments

Results 432 comments of


                                            Tri Dao

trafficstars

Question Regarding Randomness

The backward pass is not deterministic due to atomic adds. The forward pass should be deterministic.

Question Regarding Randomness

Normal if you're training the model, not normal if you're only doing inference (forward pass only).

Question Regarding Randomness

> > The backward pass is not deterministic due to atomic adds. The forward pass should be deterministic. > > Hi, I am wondering if there is a way to...

Question Regarding Randomness

> Even for the forward pass, I noticed that results are somewhat unstable in my experiments. Given two inputs `x1` and `x2`, the result of `model(torch.stack([x1, x2])` (i.e. batching) differs...

Question Regarding Randomness

> I also found that mamba will bring randomness during forward propagation and greatly affect model convergence. Can you isolate which layer or function that first produces different outputs?

Might be a solution to get built/compiles Flash Attention 2 on Windows

This is very helpful, thanks @Akatsuki030 and @Panchovix. @Akatsuki030 is it possible to fix it by declaring these variables (Headdim, kBlockM) with `constexpr static int` instead of `constexpr int`? I've...

Might be a solution to get built/compiles Flash Attention 2 on Windows

Great, thanks for the confirmation @Panchovix. I'll cut a release now (v2.3.2). Ideally we'd set up prebuilt CUDA wheels for Windows at some point so folks can just download instead...

Might be a solution to get built/compiles Flash Attention 2 on Windows

I see, thanks for the confirmation. I guess we rely on Cutlass and Cutlass requires CUDA 12.x to build on [Windows](https://github.com/NVIDIA/cutlass/blob/main/media/docs/build/building_in_windows_with_visual_studio.md).

Might be a solution to get built/compiles Flash Attention 2 on Windows

> Another note, it may be a good idea to build wheels for cu121 as well, since github actions currently doesn't build for that version. Right now github actions only...

Might be a solution to get built/compiles Flash Attention 2 on Windows

You don't have to use layer_norm.