Puyan Lotfi

Results 9 issues of Puyan Lotfi

(cherry picked from commit 39f4473f9cf431a1890ffcfcbbdc5fffbe4c7984)

TRITON_BUILD_WITH_CLANG_LLD already allows for changing the build to use clang+lld instead of gcc+bfd but it doesn't allow for leaving the C/C++ compiler on gcc and mixing and matching linkers like...

NOTE: This is an experiment, and a draft. Do not review. The following change requires a private patchset that is not yet available outside of https://github.com/plotfi/triton/pull/4 This patch adds usage...

CLA Signed

Based on #57, this version uses the autotuned to toggle use of TMA.

CLA Signed

Differential Revision: D62598482

CLA Signed
fb-exported

This PR adds BF16 support for atomics, which are less precise but cheaper BF16 accumulators have proven to be useful in the context of Split-K's where it is necessary to...

When reduce ops fail to fit within a warp, lots of SMEM operations and sync instructions are generated because outside of a warp registers can not be used to accumulate...

This is a first attempt at loading custom dialects, very early. Not ready for review yet. This is a potential follow on to https://github.com/triton-lang/triton/pull/8401 and https://github.com/triton-lang/triton/pull/8137 This is a WIP...