devito Option blockinner=True might break jit-compilation

Current code generation with DSE=aggressive might break in some examples (e.g., TTI) if the backend compiler is not ICC, as GCC doesn't like #pragma omp simd if the following loop has more then one index variable. This should be fixed by tweaking code generation, or by waiting for GCC guys to fix their compiler (in fact, the Devito generated code is legal OpenMP code)

EDIT 1: needs 3D blocking

EDIT 2: at least gcc >= 4.9 (previous versions don't recognise #pragma omp simd)

Aug 03 '17 09:08 FabioLuporini

A second instance of this bug, which might not be immediately obvious, arises when there are no loop-carried dependencies in the time loop and the DLE decides to parallelise across time. For example a trivial kernel such as f = TimeFunction(name='f', grid=grid, time_order=2); op = Operator(Eq(f, 1.)) leads to the legal loop header:

for (int time = t_s, t0 = (time)%(3); time < t_e; time += 1, t0 = (time)%(3))

which the GCC compiler errors on with OpenMP. An example can be found in PR #457.

Jan 05 '18 09:01 mlange05

The PR https://github.com/opesci/devito/pull/427 was also trying to highlight the same issue but I closed it as duplicate. This is now the blocker for https://github.com/opesci/devito/pull/397 . I have added a test for this in compilation_bug as an extension to the first touch tests.

Jan 10 '18 10:01 navjotk

It seems to me that this kind of code is illegal according to the OpenMP standard. The reference states on page 57 section 2.7.1 that parallel loops have to be in canonical loop form, as defined on page 53 section 2.6:

A loop has canonical loop form if it conforms to the following: for (init-expr; test-expr; incr-expr) structured-block

where init-expr must be one of the following: var = lb integer-type var = lb random-access-iterator-type var = lb pointer-type var = lb This doesn't seem to allow the use of multiple variables.

Jan 10 '18 13:01 jhueckelheim

in fact, there's a difference between my original issue and the later reports:

in my original issue, the incriminated loop is a pragma omp simd loop;
in the other reports, it's a pragma omp parallel that is causing the compilation failure.

However, the reference says that in both cases a loop should be in canonical form. Probably what's happening here is that the Intel compiler does more than the standard.

What we have to do is to tweak the compiler to generate omp-friendly code

Jan 10 '18 13:01 FabioLuporini

the solution to this appears to be the OpenMP linear clause, already available in OpenMP 4.0 according to the specs.

here a nice example showing how to use it. It shouldn't be too complicated; I'll try it soon

Feb 05 '19 10:02 FabioLuporini

Is this the case, if yes is there a reproducible script ?

Jan 27 '20 19:01 mloubout

I suppose this is outdated now. Need to be reopen as an opt specific one if still the case.

May 19 '20 12:05 mloubout

Is this the case, if yes is there a reproducible script ?

if you run tti and do loop blocking over all space loops, you'll see the error. But I've got in mind a work around , so I'd prefer to leave this issue open for now

May 19 '20 12:05 FabioLuporini

devito devito copied to clipboard

Option blockinner=True might break jit-compilation

devito
devito copied to clipboard