Andrew Myers
Andrew Myers
This test isn't setup to do any domain decomposition or parallelization - it's based on `FArrayBox`, not `MultiFab`. If you run it on more MPI tasks, every task is duplicating...
The cache effects won't necessarily be small, though, if these kernels are spending a significant amount of time streaming data from main memory. In fact, if I reduce the size...
Thank you for reporting this. @houjun, do you have any ideas? One thing that might help is to try to trigger this using `amrex/Tests/HDF5Benchmark`. If you change the distribution map...
Hi @BenWibking, I think the main things for merging this are: 1. Instead of changing the behavior of `amrex::OpenMP::get_max_threads()`, etc. globally, could you isolate the changes to the `Device::gpuStream` function...
> > I think the main things for merging this are: 1. Instead of changing the behavior of `amrex::OpenMP::get_max_threads()`, etc. globally, could you isolate the changes to the `Device::gpuStream` function...
Hi Mark, Yes, `amrex::Random()` is thread safe. When OpenMP is on, each thread will have its own dedicated generator that is totally independent of the others.
Sure, we can add wrappers for these.
It looks straightforward-ish to do this operation on the GPU. Not too dissimilar from operations we already do for particles. I could give it a try this week.
I found that turning off mask caching helped with memory usage: ```diff --git a/yt/data_objects/grid_patch.py b/yt/data_objects/grid_patch.py index 966f0a068..e826e4f78 100644 --- a/yt/data_objects/grid_patch.py +++ b/yt/data_objects/grid_patch.py @@ -42,7 +42,7 @@ class AMRGridPatch(YTSelectionContainer): _num_ghost_zones =...
IIRC, the mask was getting allocated to cover the entire domain on every process, not just for the subset of the domain that the process owned. Even with packing, this...