Evan Weinberg issues

Results 46 issues of


                                            Evan Weinberg

Use consistent "type" names (for ex, `value_type`) across QUDA

Per discussion in #1056 , after the GK merge we should decide on a consistent naming convention for `Float` types, etc, used throughout the library, and make the global change....

clean-up

Investigate consolidating the copy_color_spinor_mg files

There's currently an all-to-all of to, from precisions for the MG `copy_color_spinor_mg.*` files. Historically this split was done for compile time reasons. We should investigate if this is still a...

clean-up

Fuse multigrid coarsening routines that add elements to the diagonal

The routines to: * Add the identity to the coarse clover * Add a staggered mass to the coarse clover * Add a twist to the coarse clover Are all...

clean-up

Reminder: restore multisrc tests once multi-rhs is complete

Revert #1152 (or re-write the files as appropriate) once the multi-rhs work is done.

feature

Update `heatbath_test` to respect `--prec` when an output gauge is specified

Right now it saves a `double` precision gauge field regardless of the input precision---I need to fix that.

feature

Replace `isStaggered()`, etc, in the `Dirac` class with a virtual function implemented in each derived class

Described in title

clean-up

Explore using thin-QR (Cholesky decompositions) for Gram-Schmidt

Instead of using a traditional implementation of classical or modified Gram-Schmidt (or a hybrid thereof), (block-)orthonormalization can be formulated as a thin QR, which is implemented in practice via a...

feature

optimization

Staggered fused operator feature request

* Modify dirac_[improved_]staggered.cpp to use the full operator for calling `MdagM` as opposed to separate even/odd parts. In theory this does the right thing: ``` Dslash(*tmp1, in, QUDA_INVALID_PARITY); DslashXpay(out, *tmp1,...

feature

optimization

Add fine-grained parallelism + matrix tiling to computeCoarseClover

The routine `computeCoarseClover`: https://github.com/lattice/quda/blob/develop/include/kernels/coarse_op_kernel.cuh#L1014 Does not exploit a huge amount of parallelism as implemented, which turns into a bit of a nightmare when autotuning and could be a blocker in...

clean-up

optimization

sm_80: preconditioned twisted clover w/dynamic clover is slower in half precision, recon 8 than w/out dynamic clover

Double, recon 12 sees a boost. Half, recon 8 sees a regression. I don't have an apples-to-apples comparison for single (different recons), but they're included for posterity. ### With dynamic...

optimization