AI-Engineer-Note
AI-Engineer-Note copied to clipboard
Những thứ mình sưu tầm được trên lộ trình trở thành Senior AI Engineer
AI-Engineer-Note
A collection for AI Engineer & Deploy Services
-
Deeplearning
- 1. ComputerVision
- 1.1 Common Architectures
- 1.1.1 ResBlock
- 1.1.2 Gated Convolution
- 1.1.3 Multi-head Attention
- 1.1 Common Architectures
- 2. NLP
- 1. ComputerVision
-
Frameworks
- 1. TensorRT
- 1.1 Convert ONNX model to TensorRT
- 1.2 Wrapped TensorRT-CPP Models
- 2. Pytorch
- 2.1 Build Pytorch from source (Optimize speed for AMD CPU & NVIDIA GPU)
- 1. TensorRT
-
Deploy
- 1. NVIDIA
- 1.1 Multi-instance GPU (MIG)
- 1.2 FFMPEG with Nvidia hardware-acceleration
- 2. Deepstream
- 2.1 Yolov4
- 2.2 Traffic Analyst
- 2.3 SCRFD Face Detection (custom parser & NMS plugin with landmark)
- 3. Triton Inference Server
- 3.1 Cài đặt triton-server và triton-client
- 3.1.1 Các chế độ quản lý model (load/unload/reload)
- 3.2 Sơ lược về các backend trong Triton
- 3.3 Cấu hình cơ bản khi deploy mô hình
- 3.4 Deploy mô hình
- 3.4.1 ONNX-runtime
- 3.4.2 TensorRT
- 3.4.3 Pytorch & TorchScript
- 3.4.4 Kaldi (Advanced)
- 3.5 Model Batching
- 3.6 Ensemble Model và pre/post processing
- 3.7 Sử dụng Performance Analyzer Tool
- 3.8 Optimizations
- 3.8.1 Tối ưu Pytorch backend
- 3.1 Cài đặt triton-server và triton-client
- 4. TAO Toolkit (Transfer-Learning-Toolkit)
- 1. NVIDIA
-
Linux & CUDA & APT-Packages
-
Build OpenCV from source
- Build OpenCV from source
-
Install Math Kernel Library (MKL/BLAS/LAPACK/OPENBLAS)
You are recommended to install all Math Kernel Library and then compile framework (e.g pytorch, mxnet) from source using custom config for optimization. Install all LAPACK+BLAS:sudo apt install libjpeg-dev libpng-dev libblas-dev libopenblas-dev libatlas-base-dev liblapack-dev liblapacke-dev gfortranInstall MKL:
# Get the key wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB # now install that key apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB # now remove the public key file exit the root shell rm GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB # Add to apt sudo wget https://apt.repos.intel.com/setup/intelproducts.list -O /etc/apt/sources.list.d/intelproducts.list sudo sh -c 'echo deb https://apt.repos.intel.com/mkl all main > /etc/apt/sources.list.d/intel-mkl.list' # Install sudo apt-get update sudo apt-get install intel-mkl-2020.4-912 -
Fresh install NVIDIA driver (PC/Laptop/Workstation)
# Remove old packages sudo apt-get remove --purge '^nvidia-.*' sudo apt-get install ubuntu-desktop sudo apt-get --purge remove "*cublas*" "cuda*" sudo apt-get --purge remove "*nvidia*" sudo add-apt-repository --remove ppa:graphics-drivers/ppa sudo rm /etc/X11/xorg.conf sudo apt autoremove sudo reboot # After restart sudo ubuntu-drivers devices sudo ubuntu-drivers autoinstall sudo reboot -
Install CuDNN
Install keyring: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#network-repo-installation-for-ubuntu Install CuDNN9 with CUDA 11
sudo apt-get update sudo apt-get -y install cudnn9-cuda-11
-
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver
First, make sure that you have "Fresh install NVIDIA driver". If not work, try this bellow
- Make sure the package nvidia-prime is installed:
sudo apt install nvidia-primeAfterwards, run
sudo prime-select nvidia- Make sure that NVIDIA is not in blacklist
grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*to find a file containing
blacklist nvidiaand remove it, then runsudo update-initramfs -uGet boot log
journalctl -b | grep NVIDIA- If get error
This PCI I/O region assigned to your NVIDIA device is invalid:
sudo nano /etc/default/grubedit
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=realloc=off"sudo update-grub sudo reboot -
Check current CUDA version
nvcc --version -
Check current supported CUDA versions
ls /usr/local/ -
Select GPU devices
CUDA_VISIBLE_DEVICES=<index-of-devices> <command> CUDA_VISIBLE_DEVICES=0 python abc.py CUDA_VISIBLE_DEVICES=0 ./sample.sh CUDA_VISIBLE_DEVICES=0,1,2,3 python abc.py CUDA_VISIBLE_DEVICES=0,1,2,3 ./sample.sh -
Switch CUDA version
CUDA_VER=11.3 export PATH="/usr/local/cuda-$CUDA_VER/bin:$PATH" export LD_LIBRARY_PATH=/usr/local/cuda-$CUDA_VER/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} -
Check NVENV/NVDEC status
nvidia-smi dmonsee the tab %enc and %dec
-
Error with distributed training NCCL (got freezed)
export NCCL_P2P_DISABLE="1" -
Broken pipe (Distributed training with NCCL)
Run training with argsNCCL_DEBUG=INFO TORCH_CPP_LOG_LEVEL=INFO TORCH_DISTRIBUTED_DEBUG=INFO torchrun ...to gather socket name (e.g
eno1)NCCL INFO NET/IB : No device found. rnd3:77634:79720 [0] NCCL INFO NET/Socket : Using [0]eno1:10.9.3.241<0> rnd3:77634:79720 [0] NCCL INFO Using network SocketIn other nodes, run with arg
NCCL_SOCKET_IFNAME=eno1 -
Install CMake from source
version=3.23 build=2 ## don't modify from here mkdir ~/temp cd ~/temp wget https://cmake.org/files/v$version/cmake-$version.$build.tar.gz tar -xzvf cmake-$version.$build.tar.gz cd cmake-$version.$build/ ./bootstrap make -j8 sudo make install -
Install NCCL Backend (Distributed training)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb sudo dpkg -i cuda-keyring_1.0-1_all.deb sudo apt-get update sudo apt install libnccl2 libnccl-dev -
Install MXNet from source
git clone --recursive --branch 1.9.1 https://github.com/apache/incubator-mxnet.git mxnet cd mxnet cp config/linux_gpu.cmake config.cmake rm -rf build mkdir -p build && cd build cmake -DUSE_CUDA=ON -DUSE_CUDNN=OFF -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_OPENMP=OFF -DUSE_OPENCV=ON -DUSE_BLAS=open .. make -j32 cd ../python pip install --user -e . -
Tensorflow could not load dynamic library 'cudart64_101.dll'
For above example tensorflow would require CUDA 10.1, please switch to CUDA 10.1 or change tensorflow version which compatible with CUDA version, check here: https://www.tensorflow.org/install/source#gpu -
Fix Deepstream (6.2+) FFMPEG OpenCV installation
Fix some errors about undefined reference & not found of libavcodec, libavutil, libvpx, ...apt-get install --reinstall --no-install-recommends -y libavcodec58 libavcodec-dev libavformat58 libavformat-dev libavutil56 libavutil-dev gstreamer1.0-libav apt install --reinstall gstreamer1.0-plugins-good apt install --reinstall libvpx6 libx264-155 libx265-179 libmpg123-0 libmpeg2-4 libmpeg2encpp-2.1-0 gst-inspect-1.0 | grep 264 rm ~/.cache/gstreamer-1.0/registry.x86_64.bin apt install --reinstall libx264-155 apt-get install gstreamer1.0-libav apt-get install --reinstall gstreamer1.0-plugins-ugly -
Gstreamer pipeline to convert MP4-MP4 with re-encoding
gst-launch-1.0 filesrc location="<path-to-input>" ! qtdemux ! video/x-h264 ! h264parse ! avdec_h264 ! videoconvert ! x264enc ! h264parse ! qtmux ! filesink location=<path-to-output> -
Gstreamer pipeline to convert RTSP-RTMP
gst-launch-1.0 rtspsrc location='rtsp://<path-to-rtsp-input>' ! rtph264depay ! h264parse ! flvmux ! rtmpsink location='rtmp://rtmp://<path-to-rtmp-output>' -
Gstreamer pipeline to convert RTSP-RTMP with reducing resolution
gst-launch-1.0 rtspsrc location='rtsp://<path-to-rtsp-input>' ! rtpbin ! rtph264depay ! h264parse ! avdec_h264 ! videoconvert ! videoscale ! video/x-raw,width=640,height=640 ! x264enc ! h264parse ! flvmux streamable=true ! rtmpsink location='rtmp://<path-to-rtmp-output>'
-