Tuomas Koskela

Results 23 issues of Tuomas Koskela

This PR removes pointers to `part_array` in `multiply_module` and passes slices of `part_array` to subroutines instead. It also tries to clarify the indexing of `part_array` to make it clearer which...

area: main-source
improves: stability
priority: minor
type: maintenance

The individual test cases (`def test_001`, `def test_002`, etc.) in `test_check_output.py` repeat almost the same three lines of code for every test case. This could be simplified to parametrize on...

area: testing
priority: minor
type: maintenance

## Pros - Default target for PRs - Automatic closing of issues via PRs ## Cons - Cloned by default -- less stable than `master` - Would have to update...

help wanted
type: request

Things to look out for - Can we overlap communication and calculation in https://github.com/OrderN/CONQUEST-release/blob/6bf8f4a8c20fd4fa8f1c7baeb8a6b1f23a6d2408/src/multiply_module.f90#L251 _Originally posted by @tkoskela in https://github.com/OrderN/CONQUEST-release/issues/248#issuecomment-1697085794_ - Can we use OpenMP tasks to both receive data...

The subroutines https://github.com/OrderN/CONQUEST-release/blob/1378f0359b798362b84177bc4288e97ceff17824/src/PAO_grid_transform_module.f90#L98 and https://github.com/OrderN/CONQUEST-release/blob/1378f0359b798362b84177bc4288e97ceff17824/src/PAO_grid_transform_module.f90#L270 Duplicate almost all of the code with only minor differences. It would be good for code stability to move the common code into a utility...

area: PAOs
improves: stability
priority: minor
type: maintenance

In multithreaded runs, ScaLAPACK calls to `pzhevgx` are becoming a significant non-threaded bottleneck. At least the Intel mkl implementation of ScaLAPACK does not gain performance by adding threads to mkl...

improves: speed
type: question

Take one of the OpenMP parallel loops from #195 and rewrite it using `do concurrent`. Compare performance. https://www.intel.com/content/www/us/en/docs/fortran-compiler/developer-guide-reference/2023-1/do-concurrent.html https://developer.nvidia.com/blog/accelerating-fortran-do-concurrent-with-gpus-and-the-nvidia-hpc-sdk/

area: main-source
priority: minor
time: days

There's been some ambiguity (at least in my head) whether the time spent in this loop https://github.com/OrderN/CONQUEST-release/blob/4162a3c6d799960ba88bc6e92944f4e54794362e/src/calc_matrix_elements_module.f90#L531-L539 Was being spent on the `axpy` call itself, or the data access. I...

area: main-source
improves: speed
type: enhancement

The second hotspot shown in profiling in #197 is https://github.com/OrderN/CONQUEST-release/blob/4162a3c6d799960ba88bc6e92944f4e54794362e/src/PAO_grid_transform_module.f90#L394-L400 Disregarding the if statement, which we can remove, It looks like the main issue is `pao_elem_derivative_2` (and all functions it...

area: main-source
improves: speed