MIOpen
MIOpen copied to clipboard
AMD's Machine Intelligence Library
- Added Diag Forward operations - Added driver test and gtest for Diag operations The kernel is only 20% faster than ROCm if the following constraints are applied: - tensor...
- Added backward Var operation and kernel. - Added driver test and gtest for Var. When comparing the newly developed miopen var kernel with ROCm, there's performance improvement for a...
More test cases added for better coverage. Here's the summary 1. N C H W: 128 256 14 14 Covers: backward_spatial_single.cpp: variant == 3 and variant == 1 (2nd) [128...
Hi, I am running VLLM on my 7900XTX(gfx1100). I use ```vllm serve ./qwen2-vl-instruct-pytorch-7b --dtype auto --port 8000 --limit_mm_per_prompt image=4 --max_model_len 8784 --gpu_memory_utilization 0.9``` But then it shows errors: ``` $...
### Purpose This project exists to minimize our reliance on compile time parameterization in MIOpen's source kernels. The goal isn't to sacrifice performance, but rather determine a ways of reducing...
Need to assess whether this is feasible. Two types of defaults exist: 1. Some solver have hard-coded defaults -- one kernel for everything. 2. Others are configured at runtime using...
Let's conder this: ``` ./bin/MIOpenDriver conv -c 192 -H 28 -W 28 -y 5 -x 5 -k 32 -n 17 -p 1 -q 1 -v 1 -u 1 -F 1...