Ying Zhang comments

Results 35 comments of


                                            Ying Zhang

Add fp32 support for stablediffusion

Thanks for your contribution! From the stack trace, it seems that the input tensor type is fp16 instead of fp32. Maybe some previous operators have hard-coded fp16 as the output...

Slow nn.Linear on MI250

cc @fsx950223

Slow nn.Linear on MI250

I think for individual gemm kernels, AIT should behave similar compared to rocBlas. AIT's perf gain mostly come from operator fusions.

Avoid WSL requirement on Windows?

btw AIT doesn't run well on rocm even on Linux right now. There is an ongoing PR to support rocm on Linux: https://github.com/facebookincubator/AITemplate/pull/146. cc @asroy, @fsx950223

ROCM backend sync

Thanks @fsx950223 for your fix and adding the AMD CI! For some reason the CircleCI pipeline fails, will work on manually merge the PR into our internal repo and run...

ROCM backend sync

And it doesn't seem like the AMD CI is triggered. Has it been enabled successfully? @fsx950223

Stable diffusion examples generates bad images

Unfortunately not all kernels run well on SM75 GPUs. Check this readme: https://fburl.com/pimcs20r.

Error to run Bert example on AMD

@carlushuang Please send a PR and we'll merge your fix into upstream, thanks!

Question about num_runtimes & benchmark_with_tensors()

AIT runtime has two parts: CPU part and GPU part. CPU part relies on num_runtimes to do parallelization, while GPU part relies on stream to do parallelization. It's valid to...

Question about num_runtimes & benchmark_with_tensors()

I think in the multi-processor case, each process may write to the same GPU memory which causes errors. Need to check the detailed message to confirm. wrt dynamic batching support:...