Epliz issues

Results 7 issues of


                                            Epliz

adding support for exporting results as Radeon GPU Profiler profile

Hi, I have recently been using the Radeon GPU Profiler on Windows to optimize OpenCL/HIP kernels on RDNA2, and it has been very very useful thanks to its nice visualization...

Support for OpenCL/HIP profiling on Linux

Hi, The profiling capabilities are quite frankly great on Windows, with the instruction tracing being very very useful. Would it be possible for you to consider adding similar functionality on...

enhancement

Implementing own quite naive gemv kernel as replacement of default used in nn.Linear gives 20% better speed on MI100

### 🐛 Describe the bug Hi, I profiled the generation of text with the Mistral 7b LLM on my MI100 GPU and saw that some gemv fp16 kernels don't seem...

torch multinomial causes severe stall in Hugginface Transformers LLM generation

### 🐛 Describe the bug Hi, When doing text generation with Mistral 7b with Hugginface transformers on a MI100 GPU, I can see in the collected torch trace that a...

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

### Describe the bug As described in the title, rocblas_gemm_ex seems quite suboptimal when m==1 inputs/outputs are fp16 and compute is fp32 on MI100. A quite naive kernel I implemented...

Rather low training performance on "AI Benchmark" with MI100

Hi, Just to check if I set up my machine with a MI100 GPU correctly I ran the "AI Benchmark" from https://ai-benchmark.com/ranking_deeplearning_detailed.html . The inference speed is pretty good, but...

question

performance

Wrong import in inference quantization example

Hi, At https://github.com/microsoft/DeepSpeedExamples/blob/master/inference/huggingface/zero_inference/README.md , the referenced import `from deepspeed.compression.inference.quantization import _init_group_wise_weight_quantization` is wrong. The correct one is `from deepspeed.inference.quantization import _init_group_wise_weight_quantization` . Can you please correct it? Best regards, Epliz