xls
xls copied to clipboard
Enhance XLS optimizer to prefer sequential vs parallel series of adds/subtracts after analysis.
Currently, the XLS optimizer A: Does not optimize the following
top fn test_func(x: bits[32], y: bits[32], a: bits[32]) -> (bits[32], bits[32], bits[32]) {
x1: bits[32] = add(x, a, id=4)
x2: bits[32] = add(x1, a, id=5)
x3: bits[32] = add(x2, a, id=6)
y1: bits[32] = sub(y, x1, id=7)
y2: bits[32] = sub(y, x2, id=8)
y3: bits[32] = sub(y, x3, id=9)
ret val: (bits[32], bits[32], bits[32]) = tuple(y1, y2, y3, id=10)
}
B: Optimizes the following to C.
top fn test_func(x: bits[32], y: bits[32], a: bits[32]) -> (bits[32], bits[32], bits[32]) {
tmp : bits[32] = sub(x, y)
y1 : bits[32] = sub(tmp, a)
y2 : bits[32] = sub(y1, a)
y3 : bits[32] = sub(y2, a)
ret val : (bits[32], bits[32], bits[32]) = tuple(y1, y2, y3)
}
C
top fn test_func(x: bits[32], y: bits[32], a: bits[32]) -> (bits[32], bits[32], bits[32]) {
add.16: bits[32] = add(a, a, id=16)
add.13: bits[32] = add(y, a, id=13)
add.17: bits[32] = add(y, add.16, id=17)
add.15: bits[32] = add(add.13, add.16, id=15)
y1: bits[32] = sub(x, add.13, id=20)
y2: bits[32] = sub(x, add.17, id=18)
y3: bits[32] = sub(x, add.15, id=12)
ret val: (bits[32], bits[32], bits[32]) = tuple(y1, y2, y3, id=8)
}
While B is a fully sequential sequence of subtracts, it is also the one with the minimum number of operations, with critical path not too much different than either A or C. This issue is to enhance XLS to
A. Factor out the common y-x subexpression to come up with the B implementation and B. Be able to perform an analysis and prefer a structure like B over C.