Yuanqiang Liu

Results 13 issues of Yuanqiang Liu

There are about 20% performance difference between cutlass profiler‘s GemmUniversal kernel and my Gemm kernel (they look like same kernel). **GPU: T4, persistent mode: ON, locked on 1590MHz** NVCC: 11.1...

question
inactive-30d

Fold `mhlo.transpose` with non-splat constant.

awaiting review
comp:xla
size:M

Fuse dilated conv2d with fp16. BTW, I have two questions to ask: 1. Why the Batch dimension should not be dynamic? 2. Why the padding mode set to `SAME` when...

awaiting review
size:S

The code just copied from `tensorflow/compiler/mlir/lite/stablehlo/transforms/unfuse_batch_norm_pass.cc`

stat:awaiting response
comp:xla
size:M

… to log ERROR message when call SetPriority on host stream

ready to pull
comp:xla
size:XS

# Checklist - [x] The title and commit message(s) are descriptive. - [ ] Small commits made to fix your PR have been squashed to avoid history pollution. - [...

Bug
Needs revision

* to distinguish which `mhlo.slice` could be non-stride subview

enhancement