applegpu
applegpu copied to clipboard
Apple G13 GPU architecture docs and tools
It appears that APPLE GPU doesn't have the vector and scalar support ad the AMD (GCN/RDNA type)?
Hi, Not sure what is the repo is really intended for. But looking carefully, I think there is one class of important instructions are missing, which are "atom like" instructins?
Sorry if this doesnt belong here, but Im not sure if this should or not work with an `Apple M1, Sonoma 14.2.1 (23C71)` ```test.py from ast import Tuple import os...
You give one of the two operand hints the name discard. The Apple patents refer to this as last_use, which I think makes more sense and makes it clear how...
Found some interesting things in the `pow` function ```sh echo "kernel void test(uint pos [[thread_position_in_grid]], device float* out, const device float2* in) { out[pos] = metal::pow(in[pos].x, in[pos].y); }" | python3...
Hi, Will it be possible to have sample file (code.bin) on the repo? I guess the code.bin is generated with compiler_explorer.py. If yes, maybe we can have a sample metal...
NDArrayMatrixMultiplyA16 does not contain simd async copy instructions, although the kernel for A14 does. Starting with AGX3 (A15), there are some new instructions used for GEMM and Conv. I haven't...
I was trying to understand how resource descriptors work on the low level (especially in the context of this [very informative blog post](https://www.gfxstrand.net/faith/blog/2022/08/descriptors-are-hard/)). I hope this is a good place...
Just saying `unsigned` or `signed` is ambiguous, so this attempts to clarify that it refers to whether the offset is signed or unsigned extended from 32-bit to 64-bit before adding...