FBGEMM icon indicating copy to clipboard operation
FBGEMM copied to clipboard

fbgemm_gpu_py.so not found

Open ammaryasirnaich opened this issue 2 years ago • 6 comments

While importing torchtec i am getting an error

import torchrec
File fbgemm_gpu_py.so not found

My development environment is as below

Python: 3.8.13 | packaged by conda-forge | [GCC 10.3.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 3080
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.12.0a0+bd13bc6
PyTorch compiling details: PyTorch built with:
  - GCC 9.4
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash N/A)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.6
  - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_86,code=compute_86
  - CuDNN 8.4
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.4.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.13.0a0
OpenCV: 4.5.5
> pip list | grep fbgemm
fbgemm-gpu                    0.2.0
> pip list | grep torch
pytorch-quantization          2.1.2
torch                         1.12.0a0+bd13bc6
torch-tensorrt                1.1.0a0
torchmetrics                  0.9.3
torchrec                      0.2.0
torchtext                     0.13.0a0
torchvision                   0.13.0a0
torchx-nightly                2022.8.15

I would appreciate the help !

ammaryasirnaich avatar Aug 19 '22 10:08 ammaryasirnaich

Same issue, here is my settings

Python 3.8.10

fbgemm-gpu 0.2.0 torch 1.12.1 torchmertrics 0.9.3 torchrec 0.2.0 torchvision 0.13.1 torchx-nightly 2022.8.30

rdmerillat avatar Aug 31 '22 16:08 rdmerillat

Does this have something to do with the python version? This post may suggest that this would mean that perhaps fbgemm_gpu wont work unless its Python 3.7?

rdmerillat avatar Sep 01 '22 19:09 rdmerillat

I am not sure about that. Actually, my need to have fbgemm installed is one of the dependencies required by TorchRec. TorchRec has a prerequisite that mentions having requires Python version >=3.7. Based on it i think it should work on python>3.7

ammaryasirnaich avatar Sep 04 '22 10:09 ammaryasirnaich

I have just re-installed the torchrec package and I think the problem is solved. Must be the Torchrec community has solved it

ammaryasirnaich avatar Sep 04 '22 11:09 ammaryasirnaich

Probably due to the fact that I'm trying to get torchrec going inside an image for merilin-pytorch in order to get nvtabular along with torchrec, but I havent been able to get this working still. I'm sure its some versioning difference between the image and what is required, but if anyone has any inputs on getting NVT and torchrec going concurrently I'd love to hear about it.

rdmerillat avatar Sep 06 '22 19:09 rdmerillat

Well not sure, however, the docker file suggest its using pytorch-base image 22.07 you can try to downgrade it to 22.04 if it helps.

ammaryasirnaich avatar Sep 06 '22 20:09 ammaryasirnaich

Closing this ticket as a lot of changes have taken place since it was filed. Specifically, we now only support Python 3.8 - 3.11, CUDA 11.7-11.8, and PyTorch 2.0+. For comprehensive instructions on how to install FBGEMM_GPU, please refer to the Installation Instructions. Please feel free to file a new ticket if you run into further issues with FBGEMM_GPU installation.

q10 avatar May 04 '23 18:05 q10