consider using pragmas to vectorize loops in the CPU backends
Reported by andrew.corrigan, Nov 2, 2010 Is it possible to add "#pragma ivdep" [1] on the line following any "#pragma omp parallel for" directives? This can facilitate vectorization with the Intel compiler, and my understanding is that allowing the compiler to ignore vector dependencies should always be valid: if there there are vector dependencies present which cannot be safely ignored, then there will also be race conditions introduced by OpenMP parallelization anyway.
[1] http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/cref_cls/common/cppref_pragma_ivdep.htm
Yes, this should be possible. Do you happen to know whether GCC or MSVC have similar mechanisms? Also, do you happen to have a test case on hand that substantially benefits from the optimization?
Comment 2 by andrew.corrigan, Nov 2, 2010 I think the answer is no for both GCC [1] and MSVC [2], unfortunately. I will get back to you with an example.
[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33426 [2] http://msdn.microsoft.com/en-us/library/d9x1s805.aspx
Comment 5 by project member wnbell, Jan 23, 2012 With the recent code reorganization I'm not sure where these additions would go. Arguably they could be applied to the scalar/ implementations with the appropriate #ifdef guards. Alternatively, we could supply an ICC backend and make that compose-able with backend::omp and backend::tbb.
Of course we'd first want to make sure that #pragma ivdep was worth the bother at all.
Forwarded from http://code.google.com/p/thrust/issues/detail?id=262