Jianyu Huang comments

Results 20 comments of


Jianyu Huang

FBGEMM_gpu support for CUDA 11.5

Closing this issue since it is resolved.

FBGEMM_gpu support for CUDA 11.5

@shintaro-iwasaki added the CUDA 11.5 test coverage in https://github.com/pytorch/FBGEMM/pull/1234 . Looks like the OSS build/test is passing: e.g., in https://github.com/pytorch/FBGEMM/runs/7643110616?check_suite_focus=true

Change the hardcoded 128 items per warp in embeddingBag to a variable and optimize for ROCm

Do you have the perf improvement result on AMD GPUs with this PR? cc @shintaro-iwasaki @sryap @MohammadMahdiJavanmard

Change the hardcoded 128 items per warp in embeddingBag to a variable and optimize for ROCm

> > Hi @liligwu Thanks for your update. If we could use `4 * kWarpSize` (see above), that would be easier to maintain this parameter for both AMD and NVIDIA...

Failure in import: undefined symbol error from Python 3.7 + CUDA113

FBGEMM has a dependency on PyTorch installation. Have you installed PyTorch already? >> import torch >> import fbgemm_gpu

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

> hi @jianyuh I found that the parameter data types of `transpose_avx512` and `transpose_simd` are not aligned. > > ``` > template > FBGEMM_API void transpose_simd( > unsigned M, >...

Add Mx2, Mx4, 2xN, and 4xN avx512 transpose

@CaoE Thanks a lot for the awesome contribution!

Clang 14 Build Failure

@dbl001 Could you comment out /Users/davidlaxer/pytorch/FBGEMM/bench/GEMMsBenchmark.cc:125 ? The line of `((volatile char*)(llc.data()));` is for flushing the caches for better benchmarking the code. It doesn't affect the library build.

How `partition_avx512` is auto-tuned?

Yes: previously we tune something like https://github.com/pytorch/FBGEMM/pull/82/files (this is for avx2). You can adjust for your customized HW.

(pad_on_left and) ids_tensor error(s) in Dynamic Quantization on BERT tutorial

Hi @jjohnson-arm , sorry didn't see this previously. I think the tutorial was compatible with HuggingFace in late 2019 / early 2020. We need to update it to the latest...