Epliz
Epliz
Hi, I have recently been using the Radeon GPU Profiler on Windows to optimize OpenCL/HIP kernels on RDNA2, and it has been very very useful thanks to its nice visualization...
Hi, The profiling capabilities are quite frankly great on Windows, with the instruction tracing being very very useful. Would it be possible for you to consider adding similar functionality on...
### 🐛 Describe the bug Hi, I profiled the generation of text with the Mistral 7b LLM on my MI100 GPU and saw that some gemv fp16 kernels don't seem...
### 🐛 Describe the bug Hi, When doing text generation with Mistral 7b with Hugginface transformers on a MI100 GPU, I can see in the collected torch trace that a...
### Describe the bug As described in the title, rocblas_gemm_ex seems quite suboptimal when m==1 inputs/outputs are fp16 and compute is fp32 on MI100. A quite naive kernel I implemented...
Hi, Just to check if I set up my machine with a MI100 GPU correctly I ran the "AI Benchmark" from https://ai-benchmark.com/ranking_deeplearning_detailed.html . The inference speed is pretty good, but...
Hi, At https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/zero_inference/README.md , the referenced import `from deepspeed.compression.inference.quantization import _init_group_wise_weight_quantization` is wrong. The correct one is `from deepspeed.inference.quantization import _init_group_wise_weight_quantization` . Can you please correct it? Best regards, Epliz