llvmlite
llvmlite copied to clipboard
A way to discover the label of a block (in ValueRef)
br instructions have an argument telling what label(s) they are branching to, and phi instructions have arguments telling what labels they might have arrived from, but there is no way to match up these labels with the blocks they belong to.
I am trying to do control-flow analysis, including determining how many different paths are possible through any given block, and which blocks are located inside loops. Code sample:
int sample(int arg) {
for(int i=0; i<10; i++) {
arg = (2*arg + 3*i) % 71;
}
return arg;
}
Compile this with -O1
and generate LL output, then use llvm.parse_assembly()
. In the i32 @sample(i32)
function there will be at least three blocks and a phi statement that "comes from" two of the blocks. Task: write a Python program that uses llvmlite to answer the question: "Which two blocks are the ones that might branch to that phi statement?"
Disclaimer: I don't know how to do this within the "real" LLVM API, and I am a new to Python3 programming. I don't see the answer in the (rather limited) documentation of the ValueRef class.
I am using Python 3.6.8, llvmlite 0.33.0, and clang 9.0.1 on a CentOS system. I built the clang from source but I don't know how to build python3 or llvmlite.
@mrob27 thanks for submitting this! I have labelled it as a question for now.
I don't think that there's an API exposed in llvmlite
to obtain the CFG as a structure for further parsing/manipulation, though it's possible to view the CFG like this:
- Take your code sample and do e.g.
clang -O1 -S -emit-llvm llvmlite_603.c -o llvmlite_603.ll
to get some LLVM IR as text. - Do this:
from llvmlite import binding as ll
from ctypes import CFUNCTYPE, c_int
import llvmlite.binding as llvm
with open('llvmlite_603.ll') as f:
llvm_ir = f.read()
# All these initializations are required for code generation!
llvm.initialize()
llvm.initialize_native_target()
llvm.initialize_native_asmprinter() # yes, even this one
def create_execution_engine():
"""
Create an ExecutionEngine suitable for JIT code generation on
the host CPU. The engine is reusable for an arbitrary number of
modules.
"""
# Create a target machine representing the host
target = llvm.Target.from_default_triple()
target_machine = target.create_target_machine()
# And an execution engine with an empty backing module
backing_mod = llvm.parse_assembly("")
engine = llvm.create_mcjit_compiler(backing_mod, target_machine)
return engine
def compile_ir(engine, llvm_ir):
"""
Compile the LLVM IR string with the given engine.
The compiled module object is returned.
"""
# Create a LLVM module object from the IR
mod = llvm.parse_assembly(llvm_ir)
mod.verify()
# Now add the module and make sure it is ready for execution
engine.add_module(mod)
engine.finalize_object()
engine.run_static_constructors()
return mod
engine = create_execution_engine()
mod = compile_ir(engine, llvm_ir)
fname = "sample"
func_ptr = engine.get_function_address(fname)
# Run the function via ctypes
cfunc = CFUNCTYPE(c_int, c_int)(func_ptr)
res = cfunc(109)
print("%s(...) =" % fname, res)
fn = mod.get_function(fname)
CFG = ll.get_function_cfg(fn)
ll.view_dot_graph(CFG).view()
RE:
I am using Python 3.6.8, llvmlite 0.33.0, and clang 9.0.1 on a CentOS system. I built the clang from source but I don't know how to build python3 or llvmlite.
Installation options are highlighted here: http://llvmlite.pydata.org/en/latest/admin-guide/install.html#installation
Hope this helps?
Thanks for the extensive reply -- but it is not answering my question... but I think perhaps the approach you're suggesting wouldn't work.
Since I don't have graphics ability (I am working through a text-only connection) the llvmlite.binding.view_dot_graph(graph,...)
method doesn't do anything. But looking at your program, it appears that it is executing the function with an argument (in this case 109). Though that value works with the sample function I gave, in general my program will not know what argument values might be typical or valid. I must treat the function as a black box. Some functions will have very slow execution times, and some may have multiple branches that are executed only in certain cases. Dynamic analysis is not an option, I need to use static analysis.
Returning to my question, I have a program that accepts an LL file as input; it uses parse_assembly
, which gives a ValueRef
that is a module; it iterates over its functions; for each function it iterates over its blocks; and for each block it can look at the first instruction (seeing if its opcode is "phi
") and if that is the case then I want it to be able to figure out which of the function's blocks are the ones that might branch to that phi. I think labels are the way to do that, as that's what is in the actual LL file. Thus, the question was asking how to find out the labels of each block and how to find out the label(s) to which a "br
" instruction branches.
I am in the same boat. Could the "name" field of ValueRef be repurposed to hold the block label?
some of the examples actually suggest that BBL name contains something meaningful but in my experiments it always returns ""
https://github.com/numba/llvmlite/blob/1f25fa723e419e77c325a135e10acfb9b6112e8f/examples/llvmir_iter.py#L39
I also want this feature.
Until then, you can parse the first line of the string representation of a block:
def get_block_label(block):
lines = str(block).split('\n')
try:
i = 1 if lines[0] == '' else 0
if m := re.match(r'^(\d+):', lines[i]):
return m.group(1)
except:
return None