Grid issues

GPU Benchmark_ITT segfaults with MPI and ranks > 1

9

Hi, Benchmark_ITT segfaults in MPI run when nranks > 1 just after printing "Initialised RNGs". I have observed this same segfault on Perlmutter as well as a local cluster. $...

james-simone

hipcc on Crusher: function bcopy undefined (compiler does not have openmp enabled?)

1

Builtin function bcopy is undefined when the compiler does not have openmp. The workaround is to include the header strings.h. ` diff --git a/Grid/threads/Threads.h b/Grid/threads/Threads.h index 6887134d..52a3dfde 100644 --- a/Grid/threads/Threads.h...

james-simone

Create a version of Benchmark_ITT including Clover instead of Wilson

I am requesting a version of Benchmark_ITT that tests Clover performance in place of Wilson. It would be nice if the comparison point produced separate metrics for each of Domain...

james-simone

MPI2 romio321 library fails when reading >= 2GB per rank

2

# Git commit develop HEAD 135808dcfa767edf988976ae31d2876bb6389f8b # Target Platform University of Edinburgh Extreme Scaling system “Tursa” Each node: 2 x AMD ROME EPYC 32, Nvidia A100 (40GB), 1TB RAM `Linux...

mmphys

Compiling Grid for AMD GPUS

31

I know the wiki says there is currently no support for AMD GPUs. But I saw commits concerning HIP. Is there a way one could try experimenting with Grid on...

philomat

use accelerator for setCheckerboard in RHMC

2

This goes some way towards #378. One-flavour RHMC is 10–20% faster with this, and gives identical results (i.e. generated configurations, excluding the header, are bitwise identical). Things that could potentially...

edbennett

HMC on A100 spends large amounts of time in memory copy

3

Since benchmarks show we can get 1.7TFLOP/s for the Wilson kernel on one A100 but only about 230GFLOP/s on an AMD Rome node, it would seem reasonable to expect that...

edbennett

Very low acceptance for SU(2) 1 adjoint flavour RHMC

2

I'm seeing very low acceptance rates when running the RHMC for SU(2) with one adjoint flavour when compared to what I believe are exactly the same run parameters for HiRep....

edbennett

BlockConjugateGradient.h fails to compile with gcc-9.1.0

The operator() member calls Lattice_reduction.h - sliceMulMatrix which encounters an ambiguous overload of the operator* in Tensor_arith_scalar.h: ../Grid/install-grid-gpu-cuda/include/Grid/lattice/Lattice_reduction.h: In instantiation of 'void Grid::sl\ iceMaddMatrix(Grid::Lattice&, Eigen::MatrixXcd&, const Grid::Lattice&, const Grid::Lattice&, int,...

detar

Cuda error out of memory with Wilson Fermions on Volta V100 GPUs

Hi, I am having trouble running some applications with Grid (develop) on Marconi100 at CINECA (2xIBM power AC922 with 4 NVIDIA Volta V100 GPUs, NVLink 2.0) I am using ```...

LupoA

Grid
Grid copied to clipboard

Metadata

GPU Benchmark_ITT segfaults with MPI and ranks > 1

hipcc on Crusher: function bcopy undefined (compiler does not have openmp enabled?)

Create a version of Benchmark_ITT including Clover instead of Wilson

MPI2 romio321 library fails when reading >= 2GB per rank

Compiling Grid for AMD GPUS

use accelerator for setCheckerboard in RHMC

HMC on A100 spends large amounts of time in memory copy

Very low acceptance for SU(2) 1 adjoint flavour RHMC

BlockConjugateGradient.h fails to compile with gcc-9.1.0

Cuda error out of memory with Wilson Fermions on Volta V100 GPUs

← Metadata

Owner

Metadata

Grid Grid copied to clipboard

Metadata

← Metadata

Owner

Metadata

Grid
Grid copied to clipboard