quda icon indicating copy to clipboard operation
quda copied to clipboard

QUDA is a library for performing calculations in lattice QCD on GPUs.

Results 174 quda issues
Sort by recently updated
recently updated
newest added

At compile time with flag -DQUDA_GPU_ARCH=sm_75 for turing architecture, warnings appear as shown below: `ptxas warning : Value of threads per SM for entry ... is out of range. .minnctapersm...

Add an `instantiate` item for `copy_gauge_field` and `copy_gauge_field_offset` for the gauge orders, etc. One trick thing is that with the lists in `instantiate.h` it becomes hard to know which file...

clean-up

Instead of using a traditional implementation of classical or modified Gram-Schmidt (or a hybrid thereof), (block-)orthonormalization can be formulated as a thin QR, which is implemented in practice via a...

feature
optimization

* Modify dirac_[improved_]staggered.cpp to use the full operator for calling `MdagM` as opposed to separate even/odd parts. In theory this does the right thing: ``` Dslash(*tmp1, in, QUDA_INVALID_PARITY); DslashXpay(out, *tmp1,...

feature
optimization

Reduction abstraction is presently broken for non-summation reductions. While the abstracted launch can be passed different reducers for the kernel, the MPI reduction presently assumes that summation is being performed....

bug

rocm-devel branch (2f3b43a) built with ROCM 3.9.0 got error when runs hisq-stencil_test: ERROR: Error in unitarization component of the hisq fattening: 484 failures

Target_HIP

Now that we have quarter precision deflation fixed on power 9, it is possible to compute a deflation space in single precision and ten deflate in half or quarter precision....

The routine `computeCoarseClover`: https://github.com/lattice/quda/blob/develop/include/kernels/coarse_op_kernel.cuh#L1014 Does not exploit a huge amount of parallelism as implemented, which turns into a bit of a nightmare when autotuning and could be a blocker in...

clean-up
optimization

Double, recon 12 sees a boost. Half, recon 8 sees a regression. I don't have an apples-to-apples comparison for single (different recons), but they're included for posterity. ### With dynamic...

optimization