Layla Ghaffari
Layla Ghaffari
Thanks, @jrwrigh for bearing with me. A lot has changed since the last time I contributed to this mini-appand. There are still a couple of remaining tasks before merging: -...
> Cool. Squash or squash-merge? squash-merge is fine after I am done with the remaining tasks.
I am seeing a weird issue. The errors are not correct with rank 1. ``` $ build/fluids-navierstokes -problem euler_vortex -degree 3 -dm_plex_box_faces 1,1,2 -dm_plex_box_lower 0,0,0 -dm_plex _box_upper 125,125,250 -dm_plex_dim 3...
Testing with different arguments/problems, I see that the results could be different for n>1 as well.
> to make sure that I understand your issue: it works as expected with vtune, but with perf it's broken? Yes, that's correct!
@jrwrigh , this is what we've been seeing with VTune. This was built using `-O2`. Although compiling with `-O3` yields nearly identical total execution times, it doesn't provide the detailed...
> [@laylagi](https://github.com/laylagi) Oof, yeah. That's not good at all. If y'all can pin point which QFunctions those `MatUnpack33` calls are coming from and see if the assembly of those functions...
> There could be differences between how GCC vs Clang vs Intel and ARM vs x86 at play here. With intel compilers and optimization flags `-xHost -ffp-contract=fast -fopenmp-simd -funsafe-math-optimizations -Rpass=loop-vectorize...
I forgot to mention that these for-loops are being vectorized with GCC and Clang, that's why I'm confused. Am I missing an optimization flag?
>Which Intel compiler? (i.e. the latest clang-based ones, or the "classic" old ones) The classic ones! Interesting! So, when gcc says: ``` hcurl_22_qf.h:16:36: optimized: loop vectorized using 32 byte vectors...