zig icon indicating copy to clipboard operation
zig copied to clipboard

Builtin Matrix type

Open data-man opened this issue 4 years ago • 13 comments

LLVM 10 introduced nice Matrix intrinsics.

Possible syntax:

@Matrix(rows, cols, type)

Related issue: #903

data-man avatar Apr 06 '20 07:04 data-man

Would @Vector(len, T) be equivalent to @Matrix(len, 1, T) (or @Matrix(1, len, T))? If their code gen and memory is the same, then we might as well drop @Vector for @Matrix.

Sobeston avatar Apr 06 '20 08:04 Sobeston

Would @Vector(len, T) be equivalent to @Matrix(len, 1, T) (or @Matrix(1, len, T))?

Yes.

we might as well drop @Vector for @Matrix

Oh, no! Vector's operators are already perfect. :)

data-man avatar Apr 06 '20 08:04 data-man

Perhaps in keeping Vector and Matrix, it would be a good opportunity to consider operations between them.

Sobeston avatar Apr 06 '20 08:04 Sobeston

Do they have anything else than transpose, multiple, load and store? Is that useful to add to a language? How much magic will that hide?

To be honest, I'm not a big fan. It's already not clear how @Matrix(rows, cols, type) would look like in memory without reading the LLVM documentation.

BarabasGitHub avatar Apr 07 '20 16:04 BarabasGitHub

The RFC [1] referenced by the LLVM commit [2] has this to say:

The main motivation for the matrix support on the Clang side is to give users a way to

  • Guarantee generating high-quality code for matrix operations and trees of matrix operations. For isolated operations, we can guarantee vector code generation suitable for the target. For trees of operations, the proposed value type helps with eliminating temporary loads & stores.
  • Make use of specialized matrix ISA extensions, like the new matrix instructions in ARM v8.6 or various proprietary matrix accelerators, in their C/C++ code.
  • Move optimisations from matrix wrapper libraries into the compiler. We use it internally to simplify an Eigen-style matrix library, by relying on LLVM for generating tiled & fused loops for matrix operations.

Clearly the members of the LLVM community (or at least the ones backing this extension) believe that the optimizer can perform better here with the additional information about matrix representation, which to me seems like a valid argument that this should be included in the language. As long as we don't care about being bound more tightly to LLVM (which we don't seem to, given zig c++), I don't see a strong reason not to expose this.

But that still leaves a lot of free space in terms of how it should be exposed. At the LLVM level, there is no Matrix type [3]; the matrix intrinsics operate on Vectors with additional information supplied at the call site to describe the matrix dimensions. I do think that there would be concrete benefits to having a Matrix type abstraction for these intrinsics in Zig though. It would make it much easier to specify the dimensions in one place, and would allow for dimension inference when the compiler determines the result type of a matrix multiply. As long as the language supports a cast between matrices of the same size but different dimensions (which could just be @bitCast), and between matrix types and vector types of the same size, I think a specialized Matrix type is a net win. This also mirrors the decision made by the clang authors, who exposed these intrinsics via typedef float m4x4_t __attribute__((matrix_type(4, 4)));

It's already not clear how @Matrix(rows, cols, type) would look like in memory without reading the LLVM documentation.

I agree that this is a potential issue. We could make it easier by documenting the layout in the Zig documentation of the @Matrix intrinsic. The LLVM notes seem to suggest that they are considering adding support for multiple layouts, so we could alternatively change the builtin to specify layout explicitly, e.g. @Matrix(rows, cols, type, .COL_MAJOR).

Since a * b is more complex than element-wise multiply, operates on inputs of different types, and may return a third type, I would advise introducing an intrinsic @matrixMultiply(a, b) instead of overloading the * operator. This would also give us a place to specify the other information that can be attached to the LLVM intrinsic, like fast-math flags.

Perhaps in keeping Vector and Matrix, it would be a good opportunity to consider operations between them.

Looking at the LLVM documentation, the Matrix type is internally backed by a Vector type, so @bitCast support (or some specialized cast) definitely makes sense. But for the same reasons I stated above, I don't think we should implement matrix * vector. Since @Vector is semantically a SIMD type, not a mathematical vector type, I also don't think we should make @matrixVectorMultiply(matrix, vector), unless LLVM makes a specialized intrinsic for this specific operation. Instead, if this is needed, @matrixMultiply(matrix, @bitCast(@Matrix(4, 1, f32), vector)) should give all of the code generation benefits without introducing an operator to the language that has unexpected nontrivial cost, or encouraging treating simd vectors like mathematical vectors.

Overall I think our investment in this feature should be parallel to LLVM's. If they start making large improvements to the codegen from these intrinsics, or supporting more new hardware with them, it becomes more worthwhile for us to add support.

[1] RFC: Matrix Math Support http://lists.llvm.org/pipermail/llvm-dev/2019-October/136240.html [2] LLVM Code review for Matrix intrinsics https://reviews.llvm.org/D70456 [3] LLVM documentation, matrix intrinsics https://llvm.org/docs/LangRef.html#matrix-intrinsics

SpexGuy avatar Apr 07 '20 19:04 SpexGuy

* We use it internally to simplify an Eigen-style matrix library, by relying on LLVM for generating tiled & fused loops for matrix operations.

Ehm... ok. Not sure what to think of this. This is going in the way of Fortran. Doesn't mean it's bad, but I'm also not sure if implementing matrix multiplication algorithms is a compiler's job. Maybe I'm overestimating the extent of tiled & fused loops?

* Make use of specialized matrix ISA extensions, like the new matrix instructions in ARM v8.6 or various proprietary matrix accelerators, in their C/C++ code.

This is a valid argument for specialized matrix operations.

* For trees of operations, the proposed value type helps with eliminating temporary loads & stores.

I don't know enough here to have an opinion.

The LLVM notes seem to suggest that they are considering adding support for multiple layouts, so we could alternatively change the builtin to specify layout explicitly, e.g. @Matrix(rows, cols, type, .COL_MAJOR).

Seems like a good solution.

Since a * b is more complex than element-wise multiply, operates on inputs of different types, and may return a third type, I would advise introducing an intrinsic @matrixMultiply(a, b) instead of overloading the * operator.

Agreed. That makes it a lot clearer already than I originally imagined. 👍

BarabasGitHub avatar Apr 07 '20 21:04 BarabasGitHub

Given the variation in matrix memory layout between architectures (row-major or column-major? Is [|c|]*[|r|]T a matrix? Do we allow multiplies between different index orderings? If so, what's the index order of the result? Where is it stored?), and the implicit memory management inherent to some of them, I really don't think a separate matrix type is wise. If processors ever implement dedicated matrix multiply instructions (not just SIMD instructions to make matrix multiply easier), this can be revisited -- until then, I think the best course of action is to tighten the guarantees around auto-tiling and -fusing of loops.

ghost avatar Nov 21 '20 08:11 ghost

If processors ever implement dedicated matrix multiply instructions (not just SIMD instructions to make matrix multiply easier), this can be revisited

Intel AMX is a new addition to x86 to support matrix operations. Intel Instruction Set Reference (PDF). See chapter 3.

Personally I think this kind of thing is an edge case and should wait until the rest of the language is finished. Also with the rise of Arm CPUs perhaps a more sane way of dealing with vector and matrix data will become more common. We can only hope at any rate.

kyle-github avatar Nov 21 '20 15:11 kyle-github

One final comment: to be fair to Intel, AMX is a lot more sane than the ever changing set of SIMD instructions from MMX to AVX-512. But, wow, is that a lot of state. Task switching is going to be painful with the addition of that much state.

kyle-github avatar Nov 21 '20 15:11 kyle-github

Relating this to an idea from #6771: packed [N][M]T might replace the need for a @Matrix type

Even if Zig supports none of the standard math operators (+-*/ etc.) for packed [N][M]T, it's very useful to encode functions or built-ins that implement matrix-level intrinsics:

// SIMD Matrix-Multiply-Accumulate on ARMv8.6-A
// Computes A * B' + C, storing the result in C
inline fn arm_ummla(
    A: packed(u128) [2][8]u8,
    B: packed(u128) [2][8]u8,
    C: *packed(u128) [2][2]u32,
) void {
    asm volatile ("ummla %[C], %[A], %[B]",
        : [C] "=*w" (C)
        : [A] "w" (A), [B] "w" (B)
        :);
}

The above notation is a bit of shorthand: @TypeOf(A[0]) is actually packed(u64) [8]u8

The layout of the matrices in memory is implied directly by the existing rules about how packed elements are stored contiguously (specifically, packed matrices are row-major from LSB to MSB in the backing integer).

topolarity avatar Jul 15 '22 16:07 topolarity

FWIW, I don't think we really need a dedicated representation for column-major matrices, even to take advantage of hardware SIMD operations that are defined in terms of column-wise inputs/outputs. Any operation involving column-major matrices is equivalent to an operation on row-major matrices with a few transposes added and some arguments shuffled around.

(row-major) A * (column-major) B is the same thing as row-major A * transpose(B). Similarly, column-major matmul(A, B) is the same thing as row-major matmul(B, A) (this means that one can implement @matrixMultiply for packed row-major matrices in terms of "llvm.matrix.multiply" without overhead)

Column-major indexing still has it's conveniences so that you don't have to manually reverse indices (x[i][j][k] becomes x[k][j][i]), but it's not necessary to describe accelerated matrix ops

topolarity avatar Jul 15 '22 16:07 topolarity

I really want this feature!(Matrix and maybe Tensor)

ryoppippi avatar Aug 10 '22 19:08 ryoppippi

Just adding a comment similar to my comment in #7295 supporting this proposal. Matrix operations are extremely important in robotics, so having support for them in the language is a big plus for me.

For ergonomic reasons I’d prefer to have all (meaningful) arithmetic operators defined on matrices (+, -, *, and something equivalent to .* from MATLAB or other languages for element wise multiplication), though I could understand if Matrix multiplication was split into a separate compiler builtin (I’d suggest ‘@matMul’ over ‘@matrixMultiply’ purely for length).

AdamGoertz avatar Mar 09 '23 02:03 AdamGoertz

Perhaps it is better for Matrix to be in a library instead of being a language feature. By taking inspiration from scientific computing libraries like Numpy, Blitz, Armadillo and Eigen, we could suggest/develop a math/statistics library for all those operations. Some people in Academia are even replacing Numpy arrays to Pytorch/Tensorflow tensors to do stuff on GPU and run some numerical optimizations by taking advantage of AutoGrad. Internally, some operations might convert subarrays into @Vector when needed. Do we have BLAS/OpenBLAS/Lapack alternatives in Zig? Do we have libraries for numerical solvers?

Some matrix operations can be done in-place, which is not possible with the '*' syntax. For example, numpy.matmul can receive a property "out", which can be even one of the inputs (many Numpy operations have "out"). So you can reuse memory during operations. I wonder whether the matmul operation "m1 * m2 * ... * mn" (all being @Matrix) might reuse memory in between the steps.

anderflash avatar Aug 01 '23 12:08 anderflash