catalyst
catalyst copied to clipboard
Avoid using aggregate types when lowering Hamiltonian
When lowering hamiltonians, we first create an aggregate value and then store this aggregate value to an alloca pointer. We can avoid creating the aggregate value and store individual fields in the alloca pointer. In other words
func.func @hamiltonian(%obs : !quantum.obs, %p1 : memref<1xf64>, %p2 : memref<3xf64>) {
quantum.hamiltonian(%p1 : memref<1xf64>) %obs : !quantum.obs
return
}
Produces
module {
llvm.func @__catalyst__qis__HamiltonianObs(!llvm.ptr, i64, ...) -> i64
llvm.func @hamiltonian(%arg0: i64, %arg1: !llvm.ptr, %arg2: !llvm.ptr, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: !llvm.ptr, %arg7: !llvm.ptr, %arg8: i64, %arg9: i64, %arg10: i64) {
%0 = llvm.mlir.constant(1 : i64) : i64
%1 = llvm.alloca %0 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr
%2 = llvm.mlir.undef : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
%3 = llvm.insertvalue %arg1, %2[0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
%4 = llvm.insertvalue %arg2, %3[1] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
%5 = llvm.insertvalue %arg3, %4[2] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
%6 = llvm.insertvalue %arg4, %5[3, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
%7 = llvm.insertvalue %arg5, %6[4, 0] : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>
%8 = llvm.mlir.constant(1 : i64) : i64
llvm.store %7, %1 : !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)>, !llvm.ptr
%9 = llvm.call @__catalyst__qis__HamiltonianObs(%1, %8, %arg0) vararg(!llvm.func<i64 (ptr, i64, ...)>) : (!llvm.ptr, i64, i64) -> i64
llvm.return
}
}
when we could produce:
module {
llvm.func @__catalyst__qis__HamiltonianObs(!llvm.ptr, i64, ...) -> i64
llvm.func @hamiltonian(%arg0: i64, %arg1: !llvm.ptr, %arg2: !llvm.ptr, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: !llvm.ptr, %arg7: !llvm.ptr, %arg8: i64, %arg9: i64, %arg10: i64) {
%0 = llvm.mlir.constant(1 : i64) : i64
%1 = llvm.alloca %0 x !llvm.struct<(ptr, ptr, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr
// get offsets with gep
// store values from args into gep
%9 = llvm.call @__catalyst__qis__HamiltonianObs(%1, %8, %arg0) vararg(!llvm.func<i64 (ptr, i64, ...)>) : (!llvm.ptr, i64, i64) -> i64
llvm.return
}
}
This appears to be more in line with what is stated in Performance Tips for Frontend Authors
Avoid creating values of aggregate types (i.e. structs and arrays). In particular, avoid loading and storing them, or manipulating them with insertvalue and extractvalue instructions. Instead, only load and store individual fields of the aggregate. There are some exceptions to this rule:
It is fine to use values of aggregate type in global variable initializers.
It is fine to return structs, if this is done to represent the return of multiple values in registers.
It is fine to work with structs returned by LLVM intrinsics, such as the with.overflow family of intrinsics.
It is fine to use aggregate types without creating values. For example, they are commonly used in getelementptr instructions or attributes like sret.