Puyan Lotfi
Puyan Lotfi
First draft: Notepad for Examples.
(cherry picked from commit 39f4473f9cf431a1890ffcfcbbdc5fffbe4c7984)
TRITON_BUILD_WITH_CLANG_LLD already allows for changing the build to use clang+lld instead of gcc+bfd but it doesn't allow for leaving the C/C++ compiler on gcc and mixing and matching linkers like...
NOTE: This is an experiment, and a draft. Do not review. The following change requires a private patchset that is not yet available outside of https://github.com/plotfi/triton/pull/4 This patch adds usage...
Based on #57, this version uses the autotuned to toggle use of TMA.
Differential Revision: D62598482
This PR adds BF16 support for atomics, which are less precise but cheaper BF16 accumulators have proven to be useful in the context of Split-K's where it is necessary to...
When reduce ops fail to fit within a warp, lots of SMEM operations and sync instructions are generated because outside of a warp registers can not be used to accumulate...
This is a first attempt at loading custom dialects, very early. Not ready for review yet. This is a potential follow on to https://github.com/triton-lang/triton/pull/8401 and https://github.com/triton-lang/triton/pull/8137 This is a WIP...