devito
devito copied to clipboard
Option blockinner=True might break jit-compilation
Current code generation with DSE=aggressive
might break in some examples (e.g., TTI) if the backend compiler is not ICC, as GCC doesn't like #pragma omp simd
if the following loop has more then one index variable. This should be fixed by tweaking code generation, or by waiting for GCC guys to fix their compiler (in fact, the Devito generated code is legal OpenMP code)
EDIT 1: needs 3D blocking
EDIT 2: at least gcc >= 4.9
(previous versions don't recognise #pragma omp simd
)
A second instance of this bug, which might not be immediately obvious, arises when there are no loop-carried dependencies in the time loop and the DLE decides to parallelise across time. For example a trivial kernel such as f = TimeFunction(name='f', grid=grid, time_order=2); op = Operator(Eq(f, 1.))
leads to the legal loop header:
for (int time = t_s, t0 = (time)%(3); time < t_e; time += 1, t0 = (time)%(3))
which the GCC compiler errors on with OpenMP. An example can be found in PR #457.
The PR https://github.com/opesci/devito/pull/427 was also trying to highlight the same issue but I closed it as duplicate. This is now the blocker for https://github.com/opesci/devito/pull/397 . I have added a test for this in compilation_bug
as an extension to the first touch tests.
It seems to me that this kind of code is illegal according to the OpenMP standard. The reference states on page 57 section 2.7.1 that parallel loops have to be in canonical loop form, as defined on page 53 section 2.6:
A loop has canonical loop form if it conforms to the following:
for (init-expr; test-expr; incr-expr) structured-block
where init-expr
must be one of the following:
var = lb
integer-type var = lb
random-access-iterator-type var = lb
pointer-type var = lb
This doesn't seem to allow the use of multiple variables.
in fact, there's a difference between my original issue and the later reports:
- in my original issue, the incriminated loop is a
pragma omp simd
loop; - in the other reports, it's a
pragma omp parallel
that is causing the compilation failure.
However, the reference says that in both cases a loop should be in canonical form. Probably what's happening here is that the Intel compiler does more than the standard.
What we have to do is to tweak the compiler to generate omp-friendly code
the solution to this appears to be the OpenMP linear
clause, already available in OpenMP 4.0 according to the specs.
here a nice example showing how to use it. It shouldn't be too complicated; I'll try it soon
Is this the case, if yes is there a reproducible script ?
I suppose this is outdated now. Need to be reopen as an opt
specific one if still the case.
Is this the case, if yes is there a reproducible script ?
if you run tti and do loop blocking over all space loops, you'll see the error. But I've got in mind a work around , so I'd prefer to leave this issue open for now