Andrej comments

Results 373 comments of


                                            Andrej

fix for Error: must forward with targets before backward [#19]

@ent0n29 does this make the code slower for you? I wonder if I should merge to master and get rid of README comment potentially

fix for Error: must forward with targets before backward [#19]

Sorry @ent0n29 what is the final recommendation here right now? Is it ``` CFLAGS = -Ofast -fno-finite-math-only -Wno-unused-result -march=native ```

tweak: instead of using -10000.0f for finding the max, use the first item

yeah, agree i think. there are two instances of this btw

tweak: instead of using -10000.0f for finding the max, use the first item

i'll come around to this eventually

Why not Mojo?

C is just the simplest thing super happy to link to any mojo port (or anything else) from the main readme! we can benchmark and compare them all :)

examples for popular models

GPT-2 is not far away from these SOTA models at all. The most complex new layer that is needed is probably RoPE, and even that is not too complex. My...

examples for popular models

Oh one thing I'll say is that this being fp32, you probably don't want to try to work with anything much larger than ~1B. The smallest Llama 2 sadly is...

Fused bias with matmul using `cublasLtMatmul`

The [cuBLASLt API](https://docs.nvidia.com/cuda/cublas/#using-the-cublaslt-api) started with CUDA 10.1, which was released Aug 2019. I've been trying to use code that can afaik work with fairly old versions of CUDA/cuBLAS. I think...

Fused bias with matmul using `cublasLtMatmul`

This was now merged, ty @andylolu2 for pointing out too.

Precompute the scaling factor in gelu_forward and gelu_backward

fixed