hypre icon indicating copy to clipboard operation
hypre copied to clipboard

test or example using aggressive coarsening on gpus

Open chris-schroeder opened this issue 3 years ago • 20 comments

Hi all. I can't seem to get aggressive coarsening to work on gpus. Is there a test or example of this that should work? src/test/TEST_ij/agg_interp.sh is seg faulting like nobody's business. ParCSRHybrid without aggressive coarsening seems to be working.

chris-schroeder avatar May 05 '21 22:05 chris-schroeder

PS It could be that I need to compile differently, or one of a few dozen other things. It'd be good to have a working example. I'm building and running on rzansel. And I'm using 3f12d47651ecf18644d95bc9e975c572fe44ccf3.

chris-schroeder avatar May 05 '21 22:05 chris-schroeder

From Ruipeng (helpful!)

Hi, Chris,

BoomerAMG should work with aggressive coarsening. Using ij driver, I can run

mpirun -np 1 ./ij -n 100 100 100 -pmis -keepT 1 -rlx 7 -w 0.85 -exec_device -mm_cusparse 0 -agg_nl 1 -agg_interp 5 -solver 1

without problems. Hypre is configured without UVM. (just --with-cuda). There seem to be segfaults for hybrid solvers. I am looking at it.

  1. For GPUs, HYPRE_BoomerAMGSetModuleRAP2 should be 1. HYPRE_BoomerAMGSetRAP2 can be 0 or 1 (0 is usually better than 1 with recent improvements)

  2. Interfaces of HYPRE_SStructMatrixSetBoxValues, HYPRE_SStructMatrixAssemble should have not been changed. If you want setup on CPU, you need to provide CPU points. The hypre_TMemcpy in sstruct.c shows cases when you have data on CPU, need to copy to device and call HYPRE_SStructMatrixSetBoxValues with GPU points to setup on GPUs.

  3. I am not sure why the 2nd solve segfault. Is it with hybrid solver?

Thanks

-Ruipeng

chris-schroeder avatar May 05 '21 23:05 chris-schroeder

Thanks, Ruipeng! The 2nd solve seg faults are for the hybrid. I have not managed to run a first solve with the non-hybrid BoomerAMG solver. Are you using mpirun on lassen?

chris-schroeder avatar May 05 '21 23:05 chris-schroeder

Thanks, Ruipeng! The 2nd solve seg faults are for the hybrid. I have not managed to run a first solve with the non-hybrid BoomerAMG solver. Are you using mpirun on lassen?

I was running on ray with mpirun. But lassen should be fine too, with lrun or jsrun.

liruipeng avatar May 05 '21 23:05 liruipeng

@chris-schroeder There's currently no API set AggInterpType for hybrid solves, and the default (4) will be used. So, it crashes on GPUs. I will add API function for it.

liruipeng avatar May 05 '21 23:05 liruipeng

I can run your example. Thanks. If I'm setting up on the CPU and the running on the GPU, do I need to call hypre_ParCSRMatrixMigrate() at some point, as in GenerateLaplacian()?

chris-schroeder avatar May 06 '21 00:05 chris-schroeder

... also, is hypre_bind_device needed now?

chris-schroeder avatar May 06 '21 00:05 chris-schroeder

Okay! I have the non-hybrid with aggressive coarsening running on gpus now. Hopefully the issue with the hybrid solvers is just the AggInterpType which is easy to fix. Thanks again!

chris-schroeder avatar May 06 '21 05:05 chris-schroeder

hypre_ParCSRMatrixMigrate

@chris-schroeder Yes, if you set up object on CPU with host memory. You have to move it to GPU before you can give it to BoomerAMG. hypre_ParCSRMatrixMigrate does this for parcsr matrices.

liruipeng avatar May 06 '21 16:05 liruipeng

... also, is hypre_bind_device needed now?

We no longer do device bindings in HYPRE_Init (because users don't want hypre to change their bindings). So you need to do cudaSetDevice before HYPRE_Init and hypre will just get the device. We put what we had before in hypre_bind_device , so you can call it and the behavior will be same as before.

liruipeng avatar May 06 '21 16:05 liruipeng

Okay! I have the non-hybrid with aggressive coarsening running on gpus now. Hopefully the issue with the hybrid solvers is just the AggInterpType which is easy to fix. Thanks again!

Awesome. It's easy to fix. I will let you know.

liruipeng avatar May 06 '21 16:05 liruipeng

What I am doing seems to be working without calling hypre_bind_device or hypre_ParCSRMatrixMigrate. Should this not be working? Are the solves running on the CPUs, even though I have called HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE) and HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE)?

chris-schroeder avatar May 06 '21 17:05 chris-schroeder

What I am doing seems to be working without calling hypre_bind_device or hypre_ParCSRMatrixMigrate. Should this not be working? Are the solves running on the CPUs, even though I have called HYPRE_SetMemoryLocation(HYPRE_MEMORY_DEVICE) and HYPRE_SetExecutionPolicy(HYPRE_EXEC_DEVICE)?

Do you configure with --enable-unified-memory? I think the solvers are running on GPUs. hypre_bind_device only changes which device is used for each MPI rank, hypre should work with or without it.

liruipeng avatar May 06 '21 18:05 liruipeng

I am not compiling with --enable-unified-memory. From my config.log:

configure:8958: Configuring with --with-cuda=yes without unified memory.

I think HYPRE must be doing the migration for me (which is great!) given that I allocate the Vec and Matrix memory on the HOST (and have MemoryLocation DEVICE and ExecutionPolicy DEVICE).

It makes sense that hypre_bind_device is not necessary - but I wonder what happens to the memory pools if the processes are unbound. ...

chris-schroeder avatar May 06 '21 19:05 chris-schroeder

Super. Thank you Ruipeng!

chris-schroeder avatar May 07 '21 05:05 chris-schroeder

Everything seems to working, but performance is looking about the same so far as for the Jan 2020 settings. Not pressing, I think, but when you have some time, could you send some performance numbers for tests like the HYDRA-like benchmark we discussed the last time around? It would be very good to come to a consensus on performance again, to make sure I'm getting the performance I should. The tarballs from Nov 2019 are still at /p/gpfs1/crs/sph-2M-40b-2.tar and /p/gpfs1/crs/sph-8M-160b-2.tar on the CZ (note the '-2' before '.tar').

Also, I'm realizing/noticing now that it would probably be good to have HybridSetRAP2 and HybridSetModuleRAP2 accessors, too, but again, when you have some time - unless you think it will significantly improve performance.

chris-schroeder avatar May 07 '21 16:05 chris-schroeder

I am not compiling with --enable-unified-memory. From my config.log:

configure:8958: Configuring with --with-cuda=yes without unified memory.

I think HYPRE must be doing the migration for me (which is great!) given that I allocate the Vec and Matrix memory on the HOST (and have MemoryLocation DEVICE and ExecutionPolicy DEVICE).

It makes sense that hypre_bind_device is not necessary - but I wonder what happens to the memory pools if the processes are unbound. ...

Regarding hypre_bind_device, by default all MPI ranks are bound to GPU 0. So, it may hurt performance significantly without the binding, because of the competition of GPU. But if you ran with mpirun + mpibind or with lrun, it is not an issue since each rank only sees 1 (different) GPU.

liruipeng avatar May 07 '21 16:05 liruipeng

Everything seems to working, but performance is looking about the same so far as for the Jan 2020 settings. Not pressing, I think, but when you have some time, could you send some performance numbers for tests like the HYDRA-like benchmark we discussed the last time around? It would be very good to come to a consensus on performance again, to make sure I'm getting the performance I should. The tarballs from Nov 2019 are still at /p/gpfs1/crs/sph-2M-40b-2.tar and /p/gpfs1/crs/sph-8M-160b-2.tar on the CZ (note the '-2' before '.tar').

Also, I'm realizing/noticing now that it would probably be good to have HybridSetRAP2 and HybridSetModuleRAP2 accessors, too, but again, when you have some time - unless you think it will significantly improve performance.

Thanks @chris-schroeder. I will do the performance benchmark.

liruipeng avatar May 07 '21 16:05 liruipeng

Hi, @liruipeng. Any news re the performance benchmark?

chris-schroeder avatar May 20 '21 22:05 chris-schroeder

Hi, @liruipeng. Any news re the performance benchmark?

Hi @chris-schroeder I will update you after the new release is out. Sorry for the delay.

liruipeng avatar May 23 '21 00:05 liruipeng