dace
dace copied to clipboard
Parallel execution within SDFG connected component
Describe the bug When executing a connected component, of an SDFG State, independent "subcomponents" are not executed in parallel.
To Reproduce
Consider the following DaCe program:
import dace
import numpy as np
N = dace.symbol('N', dace.int32)
@dace.program
def prog(x: dace.float32[N], y: dace.float32[N], v: dace.float32[N], w: dace.float32[N]):
return np.dot(x,y) + np.dot(v,w)
size = 16
x = np.random.rand(size).astype(np.float32)
y = np.random.rand(size).astype(np.float32)
v = np.random.rand(size).astype(np.float32)
w = np.random.rand(size).astype(np.float32)
sdfg = prog.to_sdfg()
res = sdfg(x=x, y=y, v=v, w=w, N=size)
assert np.allclose(res, np.dot(x,y) + np.dot(v,w))
It computes res = np.dot(x,y) + np.dot(v,w)
. The two dot products are independent, but, by looking at the generated code,
they are executed sequentially one after the other:
void __program_prog_internal(/* ... */){
// ..
_Dot__sdfg_1_0_0_10(__state, &x[0], &y[0], __tmp0, N);
_Dot__sdfg_1_0_0_10(__state, &v[0], &w[0], __tmp1, N);
}
Expected behavior
The two dot products should have been executed in parallel, via openmp sections.
Note: the openmp_sections
config flag is already set to true
, but, from the description, it seems to refer to parallel execution between connected components, not inside a connected component.
It could also be an ad-hoc transformation, that, in the simplest case, uses state fission to create states with independent components.