Lukas Krenz
Lukas Krenz
This may increase performance by using the zmm registers more aggressively: https://www.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/advanced-optimization-options/qopt-zmm-usage-qopt-zmm-usage.html Draft because it is currently untested (and mainly because I do not want to forget that this flag...
**Describe the bug** Address sanitizer complains that some matrices allocated by initializeProjectionMatrices are deallocated with a different allocator. A small bug, happens at end of the program. **Expected behavior** Uses...
This is a quick and dirty rewrite of our proxy using the mneme library, which is a typesafe AoS/SoA container library. Please have a look. I'd like to integrate this...
Right now, SeisSol uses a timestep restriction of 1/(2N-1) for conv. order N. This is not strict enough. Dumbser, Michael, et al. "A unified framework for the construction of one-step...
**Describe the bug** CI pipeline sometimes fails due to issues like: ``` 2021-02-15T17:30:53.1630483Z [ 89%] Linking CXX executable SeisSol_proxy_Release_shsw_6_elastic 2021-02-15T17:30:53.5926485Z libSeisSol-lib.a(FlopCounter.cpp.o):(.bss+0x58): multiple definition of `libxsmm_num_total_flops' 2021-02-15T17:30:53.5928191Z CMakeFiles/SeisSol-proxy.dir/auto_tuning/proxy/src/flop_counter.cpp.o:(.bss+0x8): first defined here...
Currently, if a user forgets to specify a boundary condition, SeisSol silently crashes after computing the LTS weights. We should add a warning, e.g. in https://github.com/SeisSol/SeisSol/blob/master/src/Geometry/PUMLReader.cpp#L298
This PR adds filtering to SeisSol (compare with the filtering section in Hesthaven's Nodal DG book). The idea is to decrease the coefficients of the high-order terms of the modal...
Draft: Needs to be merged with master, new pinning routines need to be checked on other clusters. This PR adds changes needed to run on Fugaku. This includes adding support...
This pull request adds hardware accelerated routines for CRC32 and CRC32C for Arm AARCH64 CPUs. The changes here have been tested on NVIDIA Grace. In detail, it contains routines for:...
This PR adds inlining hints to several functions of the ConcurrentHashMap. With clang17 and NVIDIA Grace, this speeds up the benchmarks for begin() by ~35-40% and find() by ~15-20%. The...