mlir-dace
mlir-dace copied to clipboard
failed to legalize operation 'llvm.mlir.addressof'
I am trying to create a full C-to-DCIR optimization pipeline with llvm/Polygeist and sdfg-opt, but it doesn't look like sdfg-opt recognizes the llvm dialect, which is a fundamental requirement of C-style strings.
Reproduction steps
Use the following C source as the ultimate input:
// main.c
#include <stdio.h>
#include <stdlib.h>
static int example() {
int *A = (int *)malloc(100000 * sizeof(int));
int *B = (int *)malloc(100000 * sizeof(int));
for (int i = 0; i < 100000; ++i) {
A[i] = 5;
for (int j = 0; j < 100000; ++j)
B[j] = A[i];
for (int j = 0; j < 10000; ++j)
A[j] = A[i];
}
int res = B[0];
free(A);
free(B);
return res;
}
int main() {
printf("%d\n", example());
return 0;
}
Compile into MLIR using cgeist:
cgeist -I/usr/lib/llvm-14/lib/clang/14.0.0/include/ -S -o main.s main.c
The resulting main.s file has the following content:
module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<i64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f80, dense<128> : vector<2xi32>>, #dlti.dl_entry<i1, dense<8> : vector<2xi32>>, #dlti.dl_entry<i8, dense<8> : vector<2xi32>>, #dlti.dl_entry<i16, dense<16> : vector<2xi32>>, #dlti.dl_entry<i32, dense<32> : vector<2xi32>>, #dlti.dl_entry<f16, dense<16> : vector<2xi32>>, #dlti.dl_entry<f64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f128, dense<128> : vector<2xi32>>>, llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu", "polygeist.target-cpu" = "x86-64", "polygeist.target-features" = "+cx8,+fxsr,+mmx,+sse,+sse2,+x87", "polygeist.tune-cpu" = "generic"} {
llvm.mlir.global internal constant @str0("%d\0A\00") {addr_space = 0 : i32}
llvm.func @printf(!llvm.ptr<i8>, ...) -> i32
func.func @main() -> i32 attributes {llvm.linkage = #llvm.linkage<external>} {
%c5_i32 = arith.constant 5 : i32
%c1 = arith.constant 1 : index
%c0 = arith.constant 0 : index
%c100000 = arith.constant 100000 : index
%c10000 = arith.constant 10000 : index
%c0_i32 = arith.constant 0 : i32
%0 = llvm.mlir.addressof @str0 : !llvm.ptr<array<4 x i8>>
%1 = llvm.getelementptr %0[0, 0] : (!llvm.ptr<array<4 x i8>>) -> !llvm.ptr<i8>
%alloc = memref.alloc() : memref<100000xi32>
scf.for %arg0 = %c0 to %c100000 step %c1 {
memref.store %c5_i32, %alloc[%arg0] : memref<100000xi32>
scf.for %arg1 = %c0 to %c10000 step %c1 {
%3 = memref.load %alloc[%arg0] : memref<100000xi32>
memref.store %3, %alloc[%arg1] : memref<100000xi32>
}
}
memref.dealloc %alloc : memref<100000xi32>
%2 = llvm.call @printf(%1, %c5_i32) : (!llvm.ptr<i8>, i32) -> i32
return %c0_i32 : i32
}
}
Attempt to run sdfg-opt over main.s:
sdfg-opt --convert-to-sdfg main.s
The following error is produced:
main.s:11:10: error: failed to legalize operation 'llvm.mlir.addressof'
%0 = llvm.mlir.addressof @str0 : !llvm.ptr<array<4 x i8>>
^
main.s:11:10: note: see current operation: %12 = "llvm.mlir.addressof"() {global_name = @str0} : () -> !llvm.ptr<array<4 x i8>>
As far as I understand, without recognizing the llvm dialect, it's impossible to use any C string which is often mandatory for any serialized input/output in a complete program.
Will any support be added for this?
We are indeed working on an experimental llvm dialect integration. In fact we had the same issues with printf as you describe.
Our temporary hack was to have a case distinction in the converter for llvm.call @printf to convert it into an annotated tasklet.
The main issue is that splitting up llvm operations into individual tasklets is really inefficient for printing. Ideally we would like to group them together, but recognizing which operations to group is not trivial.
Currently we suggest avoiding any I/O and using DCIR for numerical optimizations. As you have already identified, this requires wrapper code to run the SDFG.