llvmlite A way to discover the label of a block (in ValueRef)

br instructions have an argument telling what label(s) they are branching to, and phi instructions have arguments telling what labels they might have arrived from, but there is no way to match up these labels with the blocks they belong to.

I am trying to do control-flow analysis, including determining how many different paths are possible through any given block, and which blocks are located inside loops. Code sample:

int sample(int arg) {
   for(int i=0; i<10; i++) {
      arg = (2*arg + 3*i) % 71;
   }
   return arg;
}

Compile this with -O1 and generate LL output, then use llvm.parse_assembly(). In the i32 @sample(i32) function there will be at least three blocks and a phi statement that "comes from" two of the blocks. Task: write a Python program that uses llvmlite to answer the question: "Which two blocks are the ones that might branch to that phi statement?"

Disclaimer: I don't know how to do this within the "real" LLVM API, and I am a new to Python3 programming. I don't see the answer in the (rather limited) documentation of the ValueRef class.

I am using Python 3.6.8, llvmlite 0.33.0, and clang 9.0.1 on a CentOS system. I built the clang from source but I don't know how to build python3 or llvmlite.

Jun 29 '20 12:06 mrob27

@mrob27 thanks for submitting this! I have labelled it as a question for now.

Jun 30 '20 09:06 esc

I don't think that there's an API exposed in llvmlite to obtain the CFG as a structure for further parsing/manipulation, though it's possible to view the CFG like this:

Take your code sample and do e.g. clang -O1 -S -emit-llvm llvmlite_603.c -o llvmlite_603.ll to get some LLVM IR as text.
Do this:

from llvmlite import binding as ll
from ctypes import CFUNCTYPE, c_int

import llvmlite.binding as llvm

with open('llvmlite_603.ll') as f:
    llvm_ir = f.read()

# All these initializations are required for code generation!
llvm.initialize()
llvm.initialize_native_target()
llvm.initialize_native_asmprinter()  # yes, even this one

def create_execution_engine():
    """
    Create an ExecutionEngine suitable for JIT code generation on
    the host CPU.  The engine is reusable for an arbitrary number of
    modules.
    """
    # Create a target machine representing the host
    target = llvm.Target.from_default_triple()
    target_machine = target.create_target_machine()
    # And an execution engine with an empty backing module
    backing_mod = llvm.parse_assembly("")
    engine = llvm.create_mcjit_compiler(backing_mod, target_machine)
    return engine


def compile_ir(engine, llvm_ir):
    """
    Compile the LLVM IR string with the given engine.
    The compiled module object is returned.
    """
    # Create a LLVM module object from the IR
    mod = llvm.parse_assembly(llvm_ir)
    mod.verify()
    # Now add the module and make sure it is ready for execution
    engine.add_module(mod)
    engine.finalize_object()
    engine.run_static_constructors()
    return mod


engine = create_execution_engine()
mod = compile_ir(engine, llvm_ir)

fname = "sample"
func_ptr = engine.get_function_address(fname)

# Run the function via ctypes
cfunc = CFUNCTYPE(c_int, c_int)(func_ptr)
res = cfunc(109)
print("%s(...) =" % fname, res)

fn = mod.get_function(fname)
CFG = ll.get_function_cfg(fn)
ll.view_dot_graph(CFG).view()

RE:

I am using Python 3.6.8, llvmlite 0.33.0, and clang 9.0.1 on a CentOS system. I built the clang from source but I don't know how to build python3 or llvmlite.

Installation options are highlighted here: http://llvmlite.pydata.org/en/latest/admin-guide/install.html#installation

Hope this helps?

Jul 01 '20 10:07 stuartarchibald

Thanks for the extensive reply -- but it is not answering my question... but I think perhaps the approach you're suggesting wouldn't work.

Since I don't have graphics ability (I am working through a text-only connection) the llvmlite.binding.view_dot_graph(graph,...) method doesn't do anything. But looking at your program, it appears that it is executing the function with an argument (in this case 109). Though that value works with the sample function I gave, in general my program will not know what argument values might be typical or valid. I must treat the function as a black box. Some functions will have very slow execution times, and some may have multiple branches that are executed only in certain cases. Dynamic analysis is not an option, I need to use static analysis.

Returning to my question, I have a program that accepts an LL file as input; it uses parse_assembly, which gives a ValueRef that is a module; it iterates over its functions; for each function it iterates over its blocks; and for each block it can look at the first instruction (seeing if its opcode is "phi") and if that is the case then I want it to be able to figure out which of the function's blocks are the ones that might branch to that phi. I think labels are the way to do that, as that's what is in the actual LL file. Thus, the question was asking how to find out the labels of each block and how to find out the label(s) to which a "br" instruction branches.

Jul 03 '20 19:07 mrob27

I am in the same boat. Could the "name" field of ValueRef be repurposed to hold the block label?

Jul 05 '20 16:07 robertmuth

some of the examples actually suggest that BBL name contains something meaningful but in my experiments it always returns ""

https://github.com/numba/llvmlite/blob/1f25fa723e419e77c325a135e10acfb9b6112e8f/examples/llvmir_iter.py#L39

Oct 15 '20 02:10 robertmuth

I also want this feature.

Until then, you can parse the first line of the string representation of a block:

def get_block_label(block):
    lines = str(block).split('\n')
    try:
        i = 1 if lines[0] == '' else 0
        if m := re.match(r'^(\d+):', lines[i]):
            return m.group(1)
    except:
        return None

Jul 05 '22 18:07 lwerdna

llvmlite llvmlite copied to clipboard

A way to discover the label of a block (in ValueRef)

llvmlite
llvmlite copied to clipboard