mlir-dace icon indicating copy to clipboard operation
mlir-dace copied to clipboard

failed to legalize operation 'llvm.mlir.addressof'

Open iBug opened this issue 2 years ago • 1 comments

I am trying to create a full C-to-DCIR optimization pipeline with llvm/Polygeist and sdfg-opt, but it doesn't look like sdfg-opt recognizes the llvm dialect, which is a fundamental requirement of C-style strings.

Reproduction steps

Use the following C source as the ultimate input:

// main.c
#include <stdio.h>
#include <stdlib.h>

static int example() {
    int *A = (int *)malloc(100000 * sizeof(int));
    int *B = (int *)malloc(100000 * sizeof(int));
    for (int i = 0; i < 100000; ++i) {
        A[i] = 5;
        for (int j = 0; j < 100000; ++j)
            B[j] = A[i];
        for (int j = 0; j < 10000; ++j)
            A[j] = A[i];
    }
    int res = B[0];
    free(A);
    free(B);
    return res;
}

int main() {
    printf("%d\n", example());
    return 0;
}

Compile into MLIR using cgeist:

cgeist -I/usr/lib/llvm-14/lib/clang/14.0.0/include/ -S -o main.s main.c

The resulting main.s file has the following content:

module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<i64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f80, dense<128> : vector<2xi32>>, #dlti.dl_entry<i1, dense<8> : vector<2xi32>>, #dlti.dl_entry<i8, dense<8> : vector<2xi32>>, #dlti.dl_entry<i16, dense<16> : vector<2xi32>>, #dlti.dl_entry<i32, dense<32> : vector<2xi32>>, #dlti.dl_entry<f16, dense<16> : vector<2xi32>>, #dlti.dl_entry<f64, dense<64> : vector<2xi32>>, #dlti.dl_entry<f128, dense<128> : vector<2xi32>>>, llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu", "polygeist.target-cpu" = "x86-64", "polygeist.target-features" = "+cx8,+fxsr,+mmx,+sse,+sse2,+x87", "polygeist.tune-cpu" = "generic"} {
  llvm.mlir.global internal constant @str0("%d\0A\00") {addr_space = 0 : i32}
  llvm.func @printf(!llvm.ptr<i8>, ...) -> i32
  func.func @main() -> i32 attributes {llvm.linkage = #llvm.linkage<external>} {
    %c5_i32 = arith.constant 5 : i32
    %c1 = arith.constant 1 : index
    %c0 = arith.constant 0 : index
    %c100000 = arith.constant 100000 : index
    %c10000 = arith.constant 10000 : index
    %c0_i32 = arith.constant 0 : i32
    %0 = llvm.mlir.addressof @str0 : !llvm.ptr<array<4 x i8>>
    %1 = llvm.getelementptr %0[0, 0] : (!llvm.ptr<array<4 x i8>>) -> !llvm.ptr<i8>
    %alloc = memref.alloc() : memref<100000xi32>
    scf.for %arg0 = %c0 to %c100000 step %c1 {
      memref.store %c5_i32, %alloc[%arg0] : memref<100000xi32>
      scf.for %arg1 = %c0 to %c10000 step %c1 {
        %3 = memref.load %alloc[%arg0] : memref<100000xi32>
        memref.store %3, %alloc[%arg1] : memref<100000xi32>
      }
    }
    memref.dealloc %alloc : memref<100000xi32>
    %2 = llvm.call @printf(%1, %c5_i32) : (!llvm.ptr<i8>, i32) -> i32
    return %c0_i32 : i32
  }
}

Attempt to run sdfg-opt over main.s:

sdfg-opt --convert-to-sdfg main.s

The following error is produced:

main.s:11:10: error: failed to legalize operation 'llvm.mlir.addressof'
    %0 = llvm.mlir.addressof @str0 : !llvm.ptr<array<4 x i8>>
         ^
main.s:11:10: note: see current operation: %12 = "llvm.mlir.addressof"() {global_name = @str0} : () -> !llvm.ptr<array<4 x i8>>

As far as I understand, without recognizing the llvm dialect, it's impossible to use any C string which is often mandatory for any serialized input/output in a complete program.

Will any support be added for this?

iBug avatar Sep 24 '23 09:09 iBug

We are indeed working on an experimental llvm dialect integration. In fact we had the same issues with printf as you describe. Our temporary hack was to have a case distinction in the converter for llvm.call @printf to convert it into an annotated tasklet. The main issue is that splitting up llvm operations into individual tasklets is really inefficient for printing. Ideally we would like to group them together, but recognizing which operations to group is not trivial. Currently we suggest avoiding any I/O and using DCIR for numerical optimizations. As you have already identified, this requires wrapper code to run the SDFG.

Berke-Ates avatar Sep 25 '23 18:09 Berke-Ates