Ye Luo comments

Results 357 comments of


                                            Ye Luo

Reduce synchronization needed

@carlobertolli I got confused. Were you **not** asking performance tests to show the benefit of your optimization prototype?

Reduce synchronization needed

Have you see huge reduction in a trace timeline with the following example ``` int a #pragma omp target map(tofrom: a) { a = a*2 } ``` Since the example...

@carlobertolli Here is the a benchmark you may try with from QMCPACK performance tests. performance-NiO-a64-e768-batched_driver-w16-DU32-1-4 test case Make a real+mixed precision build. Here is the recipe for `AOMP 14.0_1` ```...

Reduce synchronization needed

From this offload region. https://github.com/ye-luo/miniqmc/blob/c0b6c89746e424f5b198bd63ad63dd5bb5cd12cf/src/QMCWaveFunctions/einspline_spo_omp.cpp#L413 I still see two synchronization(single_wait_scaquire) being used ![Screenshot from 2022-02-25 21-51-11](https://user-images.githubusercontent.com/1454251/155828050-02c187e2-23fe-49da-ae53-03b38a03fecd.png) However, with the CUDA plugin. only one synchronize is needed. ![Screenshot from 2022-02-25 21-49-20](https://user-images.githubusercontent.com/1454251/155828062-5e603afe-91e3-40f1-97a7-28009feaef80.png)

Reduce synchronization needed

I'm thinking of leaving it open here. Once we improve the performance in upstream and propagate the change back to AOMP, then we close it here.

std::complex support needed

The command line on the first line in the description. I tried removing the offload options and get the following. ``` $ nm a.out |grep muldc U __muldc3@@GCC_4.0.0 ``` from...

std::complex support needed

11.7-1 works with default optimization, -O3, -O3 -ffast-math. As long as I add -g, the compiler stops.

The addtional thread for GPU

FYI: From a miniQMC run ``` OMP_NUM_THREADS=8 rocprof --hsa-trace ./bin/check_spo -n 1 ``` 3916422 is not an OpenMP thread but it only calls many `hsa_system_get_info` in the initialization. I'm refering...

OpenMP HIP Memory interop issue

> It's not a fix for hipMemset, but you might be interested in hsa_amd_memory_fill as an alternative Thank you for the info. I don't really use hipMemset in the application...

OpenMP HIP Memory interop issue

In the application, I use memory ptr from omp_target_alloc to call hipblas or hip kernel. I was expecting hipMemSet as a convenient routine which ends up calling a kernel. It...