Yuanqiang Liu issues

Results 13 issues of


                                            Yuanqiang Liu

[QST] Profiling difference between GemmUinversal and Gemm?

There are about 20% performance difference between cutlass profiler‘s GemmUniversal kernel and my Gemm kernel (they look like same kernel). **GPU: T4, persistent mode: ON, locked on 1590MHz** NVCC: 11.1...

question

inactive-30d

[MHLO] transpose fold non-splat constant

Fold `mhlo.transpose` with non-splat constant.

awaiting review

comp:xla

size:M

[mlir/lite] fuse dilated conv2d with fp16

Fuse dilated conv2d with fp16. BTW, I have two questions to ask: 1. Why the Batch dimension should not be dynamic? 2. Why the padding mode set to `SAME` when...

awaiting review

size:S

[MHLO] let unfuse-batch-norm-inference produce bcast_mul+bcast_add

The code just copied from `tensorflow/compiler/mlir/lite/stablehlo/transforms/unfuse_batch_norm_pass.cc`

stat:awaiting response

comp:xla

size:M

[XLA][StreamExecutor] add empty implementation for host stream, avoid…

… to log ERROR message when call SetPriority on host stream

ready to pull

comp:xla

size:XS

fix memory leak of npy_file's move assignment

# Checklist - [x] The title and commit message(s) are descriptive. - [ ] Small commits made to fix your PR have been squashed to avoid history pollution. - [...

Bug

Needs revision

Yuanqiang Liu

[QST] Profiling difference between GemmUinversal and Gemm?

[MHLO] transpose fold non-splat constant

[mlir/lite] fuse dilated conv2d with fp16

[MHLO] let unfuse-batch-norm-inference produce bcast_mul+bcast_add

[XLA][StreamExecutor] add empty implementation for host stream, avoid…

fix memory leak of npy_file's move assignment

Fix return tuple with one element

WIP

[compiler] add util function isSliceSubviewWithoutStride

[torch-frontend] refator requirements.txt