Jianyu Huang

Results 20 comments of Jianyu Huang

Closing this issue since it is resolved.

@shintaro-iwasaki added the CUDA 11.5 test coverage in https://github.com/pytorch/FBGEMM/pull/1234 . Looks like the OSS build/test is passing: e.g., in https://github.com/pytorch/FBGEMM/runs/7643110616?check_suite_focus=true

Do you have the perf improvement result on AMD GPUs with this PR? cc @shintaro-iwasaki @sryap @MohammadMahdiJavanmard

> > Hi @liligwu Thanks for your update. If we could use `4 * kWarpSize` (see above), that would be easier to maintain this parameter for both AMD and NVIDIA...

FBGEMM has a dependency on PyTorch installation. Have you installed PyTorch already? >> import torch >> import fbgemm_gpu

> hi @jianyuh I found that the parameter data types of `transpose_avx512` and `transpose_simd` are not aligned. > > ``` > template > FBGEMM_API void transpose_simd( > unsigned M, >...

@CaoE Thanks a lot for the awesome contribution!

@dbl001 Could you comment out /Users/davidlaxer/pytorch/FBGEMM/bench/GEMMsBenchmark.cc:125 ? The line of `((volatile char*)(llc.data()));` is for flushing the caches for better benchmarking the code. It doesn't affect the library build.

Yes: previously we tune something like https://github.com/pytorch/FBGEMM/pull/82/files (this is for avx2). You can adjust for your customized HW.

Hi @jjohnson-arm , sorry didn't see this previously. I think the tutorial was compatible with HuggingFace in late 2019 / early 2020. We need to update it to the latest...