envd icon indicating copy to clipboard operation
envd copied to clipboard

feat: TensorRT support

Open oocococo opened this issue 2 years ago • 17 comments

Description

TensorRT is also an important component of CUDA Toolkit. It would be great if you could impl an func like install.cuda(tensorrt=7.2.3)


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

oocococo avatar Oct 28 '22 01:10 oocococo

Thanks for raising this!

gaocegege avatar Oct 28 '22 03:10 gaocegege

/assign

aseaday avatar Nov 01 '22 03:11 aseaday

We now have lib for you. https://github.com/tensorchord/envdlib/pull/10/files

aseaday avatar Nov 01 '22 11:11 aseaday

Do you need to install binding with your DL framework?

aseaday avatar Nov 01 '22 11:11 aseaday

How can I test if it works?

The error message from my setup(without envdlib) is

2022-11-01 11:46:15.238959: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory
2022-11-01 11:46:15.239207: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvrtc.so.11.1: cannot open shared object file: No such file or directory

And I tried a build.envd like

envdlib = include("https://github.com/aseaday/envdlib")
def build():
    config.pip_index(url="https://mirror.sjtu.edu.cn/pypi/web/simple")                                                      base(os="ubuntu20.04", language="python3.8")
    envdlib.tensorrt(os="20.04", cuda="11.2.1", trt="7.2.3")
...

But envd build got Error: module has no .tensorrt field or method

oocococo avatar Nov 01 '22 11:11 oocococo

@oocococo You can copy the function content directly to your file as a temporary test.

def tensorrt(os="20.04", cuda="11.6.2", trt="8.4.3.1"):
    """Install tensorrt

    Args:
        os (Optional[str]): os version
        cuda (Optional[str]): cuda version
        trt (Optional[str]): tensorrt version
    """
    run(
        [
            'CUDA_VERSION=%s && TRT_VERSION=%s && \
if [ "${CUDA_VERSION}" = "10.2" ] ; then \
    v="${TRT_VERSION%.*}-1+cuda${CUDA_VERSION}" &&\
    apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub &&\
    apt-get update &&\
    sudo apt-get install libnvinfer8=${v} libnvonnxparsers8=${v} libnvparsers8=${v} libnvinfer-plugin8=${v} \
        libnvinfer-dev=${v} libnvonnxparsers-dev=${v} libnvparsers-dev=${v} libnvinfer-plugin-dev=${v} \
        python3-libnvinfer=${v}; \
else \
    v="${TRT_VERSION%.*}-1+cuda${CUDA_VERSION%.*}" &&\
    apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub &&\
    apt-get update &&\
    sudo apt-get install libnvinfer8=${v} libnvonnxparsers8=${v} libnvparsers8=${v} libnvinfer-plugin8=${v} \
        libnvinfer-dev=${v} libnvonnxparsers-dev=${v} libnvparsers-dev=${v} libnvinfer-plugin-dev=${v} \
        python3-libnvinfer=${v}; \
fi'
            % (cuda, trt)
        ]
    )

VoVAllen avatar Nov 01 '22 12:11 VoVAllen

@VoVAllen did you mean exec the run part manually?

oocococo avatar Nov 01 '22 12:11 oocococo

It's just python function. You can copy that as declaration of function and call tensorrt(....) directly in your build function.

VoVAllen avatar Nov 01 '22 12:11 VoVAllen

@aseaday line 27 of the function seems buggy

Error: not enough arguments for format string

oocococo avatar Nov 02 '22 01:11 oocococo

Author

Sorry for this inconvinience. Now you can use like this:

envdlib = include("https://github.com/tensorchord/envdlib")
def build():
    base(os="ubuntu20.04", language="python3")
    shell("zsh")
    install.cuda(version="11.6.2", cudnn="8")
    envdlib.tensorrt()

aseaday avatar Nov 02 '22 09:11 aseaday

Author

Sorry for this inconvinience. Now you can use like this:

envdlib = include("https://github.com/tensorchord/envdlib")
def build():
    base(os="ubuntu20.04", language="python3")
    shell("zsh")
    install.cuda(version="11.6.2", cudnn="8")
    envdlib.tensorrt()

I've update envd and use the default envdlib.tensorrt(),but still get error message in my the ipynb

2022-11-02 13:24:28.395692: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-02 13:24:28.521478: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-02 13:24:28.556504: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-02 13:24:29.198706: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-11-02 13:24:29.198832: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-11-02 13:24:29.198840: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

I just run a example ipynb from sktime

oocococo avatar Nov 02 '22 13:11 oocococo

Author

Sorry for this inconvinience. Now you can use like this:

envdlib = include("https://github.com/tensorchord/envdlib")
def build():
    base(os="ubuntu20.04", language="python3")
    shell("zsh")
    install.cuda(version="11.6.2", cudnn="8")
    envdlib.tensorrt()

I've update envd and use the default envdlib.tensorrt(),but still get error message in my the ipynb

2022-11-02 13:24:28.395692: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-02 13:24:28.521478: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-02 13:24:28.556504: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2022-11-02 13:24:29.198706: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-11-02 13:24:29.198832: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-11-02 13:24:29.198840: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

I just run a example ipynb from sktime

I will check it. It seems that the tensorflow binding with tensorrt.

aseaday avatar Nov 04 '22 06:11 aseaday

Could you gimme your envd.build, because I found your tensorflow tried to found tensorrt7 but we had installed tensorrt7 only.

aseaday avatar Nov 04 '22 09:11 aseaday

@aseaday sorry for the postpone

envdlib = include("https://github.com/tensorchord/envdlib")
def build():
    config.pip_index(url="https://mirror.sjtu.edu.cn/pypi/web/simple")
    base(os="ubuntu20.04", language="python3.8")
    config.apt_source(source="""
# See http://help.ubuntu.com/community/UpgradeNotes for how to upgrade to
# newer versions of the distribution.
    deb https://mirror.sjtu.edu.cn/ubuntu focal main restricted
# deb-src https://mirror.sjtu.edu.cn/ubuntu focal main restricted
    deb https://mirror.sjtu.edu.cn/ubuntu focal-updates main restricted
# deb-src https://mirror.sjtu.edu.cn/ubuntu focal-updates main restricted
    deb https://mirror.sjtu.edu.cn/ubuntu focal universe
# deb-src https://mirror.sjtu.edu.cn/ubuntu focal universe
    deb https://mirror.sjtu.edu.cn/ubuntu focal-updates universe
# deb-src https://mirror.sjtu.edu.cn/ubuntu focal-updates universe
    deb https://mirror.sjtu.edu.cn/ubuntu focal multiverse
# deb-src https://mirror.sjtu.edu.cn/ubuntu focal multiverse
    deb https://mirror.sjtu.edu.cn/ubuntu focal-updates multiverse
# deb-src https://mirror.sjtu.edu.cn/ubuntu focal-updates multiverse
    deb https://mirror.sjtu.edu.cn/ubuntu focal-backports main restricted universe multiverse
# deb-src https://mirror.sjtu.edu.cn/ubuntu focal-backports main restricted universe multiverse
    deb http://archive.canonical.com/ubuntu focal partner
# deb-src http://archive.canonical.com/ubuntu focal partner
    deb https://mirror.sjtu.edu.cn/ubuntu focal-security main restricted universe multiverse
# deb-src https://mirror.sjtu.edu.cn/ubuntu focal-security main restricted universe multiverse
    """)
    install.python_packages(name = [
            "sktime[dl,all-extras]",
            "seaborn",
            "pyts",
            "ipykernel"
    ])
    install.vscode_extensions([
            "ms-python.python",
            "ms-toolsai.jupyter",
#            "GitHub.copilot"
    #        "ms-python.vscode-pylance"
    ])
    install.cuda(version="11.6.2", cudnn="8")
    envdlib.tensorrt()
    shell("zsh")
    install.apt_packages(name = [
            "linux-libc-dev",
            "htop",
            "fzf"
    ])

oocococo avatar Nov 08 '22 03:11 oocococo

I found it was cause by the fact tensorflow used tensorrt 7. But we don't have a auto way to install tensorrt 7 because nvidia do not provide a apt package to install tensorrt 7. Let me think of a way to do that.

aseaday avatar Nov 11 '22 08:11 aseaday

@aseaday I make it work by installing TensorRT.tar manually with the official tutorial (https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar). But during the installation process, things like ldconfig also makes me confused. Maybe envd can do all these things automatically?

oocococo avatar Nov 22 '22 09:11 oocococo

@aseaday I make it work by installing TensorRT.tar manually with the official tutorial (https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-tar). But during the installation process, things like ldconfig also makes me confused. Maybe envd can do all these things automatically?

Hi, I am discussing with TensorRT team about old version's pacakge management. I will help you to handle this problems this week. static link is really a hard thing for AI/ML engineers.

aseaday avatar Nov 22 '22 09:11 aseaday