Error when using FP16 or Mixed precision

Open Vaccummer opened this issue 1 year ago • 3 comments

When I train the model of dtype fp16 or mixed precision on V100 sxm2 16G, loss.backward() return error:

python: /project/lib/Analysis/Allocation.cpp:40: std::pair<llvm::SmallVector, llvm::SmallVector > mlir::triton::getCvtOrder(mlir::Attribute, mlir::Attribute): Assertion `!(srcMmaLayout && dstMmaLayout) && "Unexpected mma -> mma layout conversion"' failed.

Jul 17 '24 12:07 Vaccummer

Triton doens't support V100 very well

Jul 17 '24 18:07 tridao

Triton doens't support V100 very well

Thanks for reply, and is there any solution or alternative of Triton?

Jul 22 '24 08:07 Vaccummer

There's a reference implementation in pytorch but would probably be quite a bit slower

Jul 22 '24 16:07 tridao

Hi @tridao, just to confirm and for documentation, we had the same error on an older Laptop GPU and changing back to full-precission resolved the issue for us.

Cheers, Sascha

Nov 16 '24 10:11 sascha-kirch