CUDA.jl icon indicating copy to clipboard operation
CUDA.jl copied to clipboard

Explore NVPTX's sched4reg

Open maleadt opened this issue 4 years ago • 3 comments

static cl::opt<bool> sched4reg(
    "nvptx-sched4reg",
    cl::desc("NVPTX Specific: schedule for register pressue"), cl::init(false));

maleadt avatar Nov 16 '21 09:11 maleadt

Also:

// LSV is still relatively new; this switch lets us turn it off in case we
// encounter (or suspect) a bug.
// TODO/NOTE: don't want this when under register pressure
static cl::opt<bool>
    DisableLoadStoreVectorizer("disable-nvptx-load-store-vectorizer",
                               cl::desc("Disable load/store vectorizer"),
                               cl::init(false), cl::Hidden);

maleadt avatar Nov 16 '21 09:11 maleadt

@wsmoses and I had a good benchmark for this in https://github.com/wsmoses/Enzyme-GPU-Tests/tree/main/DG/cuda the reverse code performance notably worse on CUDA.jl then on AMDGPU.jl

vchuravy avatar Dec 06 '21 18:12 vchuravy

You can try easily with LLVM.clopts("--nvptx-sched4reg")

maleadt avatar Dec 09 '21 14:12 maleadt