Polygeist
Polygeist copied to clipboard
Add support for splitting if's with barriers in parallel ops without interchanging them
This pr adds two new ways to handle ifs with barriers in parallel regions.
This is the current way it is done:
parallel {
A()
if {
B()
barrier
C()
}
D()
}
->
parallel {
A()
}
if {
parallel {
B()
}
parallel {
C()
}
}
parallel {
D()
}
The first one allows ifs with directly nested barriers to be split at the barrier without the need to split them off with barriers and interchange them with the parallel op as such:
parallel {
A()
if {
B()
barrier
C()
}
D()
}
->
parallel {
A()
if {
B()
}
}
parallel {
if {
C()
}
D()
}
This should hopefully improve performance since it keeps A, B and C,D in the same parallel region.
The second one joins the appropriate blocks for the two cases where the if condition evaluates to true or false
parallel {
A()
if {
B()
barrier
C()
}
D()
}
->
if {
parallel {
A()
B()
}
parallel {
C()
D()
}
} else {
parallel {
A()
D()
}
}
This allows us to get rid of the branch in the parallel at the cost of increased code size. This second way actually makes the code size explode exponentially wrt the number of barriers so it might only have limited use with the help of some heuristics (not yet implemented) to decide when to use it.
Can this not alternatively become the following, avoiding code duplication?
parallel {
A()
if {
B()
}
}
parallel {
if {
C()
}
D()
}
One can choose between
parallel {
A()
if {
B()
}
}
parallel {
if {
C()
}
D()
}
and
if {
parallel {
A()
B()
}
parallel {
C()
D()
}
} else {
parallel {
A()
D()
}
}
by specifying --cpuify="distribute.ifsplit"
or --cpuify="distribute.ifhoist"
respectively
(the default is still the original old way)
Both of the new ways result in close to no overall performance difference on all of rodinia combined, with individual benchmark speedups seemingly ranging from -7% to +4% and -2% to +2% respectively compared to the current transformation. (some of it could be attributed to randomness)