opencv_contrib
                                
                                 opencv_contrib copied to clipboard
                                
                                    opencv_contrib copied to clipboard
                            
                            
                            
                        cv::cuda::NvidiaOpticalFlow_2_0 producing strange flow-vectors
System information
CPU
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              12
On-line CPU(s) list: 0-11
Thread(s) per core:  2
Core(s) per socket:  6
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping:            7
CPU MHz:             2200.156
BogoMIPS:            4400.31
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           32K
L2 cache:            1024K
L3 cache:            39424K
NUMA node0 CPU(s):   0-11
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
GPU
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01    Driver Version: 470.82.01    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P0    49W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
OS
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster
OpenCV Build Information
General configuration for OpenCV 4.5.4 =====================================
  Version control:               unknown
  Extra modules:
    Location (extra):            /home/shrik/opencv/opencv_contrib-4.5.4/modules
    Version control (extra):     unknown
  Platform:
    Timestamp:                   2021-11-07T22:46:51Z
    Host:                        Linux 4.19.0-17-cloud-amd64 x86_64
    CMake:                       3.13.4
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               RELEASE
  CPU/HW features:
    Baseline:                    SSE SSE2 SSE3
      requested:                 SSE3
    Dispatched code generation:  SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
      requested:                 SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
      SSE4_1 (17 files):         + SSSE3 SSE4_1
      SSE4_2 (2 files):          + SSSE3 SSE4_1 POPCNT SSE4_2
      FP16 (1 files):            + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
      AVX (5 files):             + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
      AVX2 (32 files):           + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
      AVX512_SKX (8 files):      + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
  C/C++:
    Built as dynamic libs?:      YES
    C++ standard:                11
    C++ Compiler:                /usr/bin/c++  (ver 8.3.0)
    C++ flags (Release):         -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections  -msse -msse2 -msse3 -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a   -Wl,--gc-sections -Wl,--as-needed
    Linker flags (Debug):        -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a   -Wl,--gc-sections -Wl,--as-needed
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:          m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu
    3rdparty dependencies:
  OpenCV modules:
    To be built:                 alphamat aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency sfm shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    cudacodec world
    Disabled by dependency:      -
    Unavailable:                 cvv java julia matlab ovis python2 viz
    Applications:                tests perf_tests apps
    Documentation:               NO
    Non-free algorithms:         YES
  GUI:                           GTK3
    GTK+:                        YES (ver 3.24.5)
      GThread :                  YES (ver 2.58.3)
      GtkGlExt:                  NO
    OpenGL support:              NO
    VTK support:                 NO
  Media I/O:
    ZLib:                        /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11)
    JPEG:                        /usr/lib/x86_64-linux-gnu/libjpeg.so (ver 62)
    WEBP:                        build (ver encoder: 0x020f)
    PNG:                         /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.6.36)
    TIFF:                        /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
    JPEG 2000:                   build (ver 2.4.0)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES
  Video I/O:
    DC1394:                      YES (2.2.5)
    FFMPEG:                      YES
      avcodec:                   YES (58.35.100)
      avformat:                  YES (58.20.100)
      avutil:                    YES (56.22.100)
      swscale:                   YES (5.3.100)
      avresample:                YES (4.0.0)
    GStreamer:                   YES (1.14.4)
    v4l/v4l2:                    YES (linux/videodev2.h)
  Parallel framework:            TBB (ver 2018.0 interface 10006)
  Trace:                         YES (with Intel ITT)
  Other third-party libraries:
    Intel IPP:                   2020.0.0 Gold [2020.0.0]
           at:                   /home/shrik/opencv/opencv-4.5.4/build/3rdparty/ippicv/ippicv_lnx/icv
    Intel IPP IW:                sources (2020.0.0)
              at:                /home/shrik/opencv/opencv-4.5.4/build/3rdparty/ippicv/ippicv_lnx/iw
    VA:                          NO
    Lapack:                      NO
    Eigen:                       YES (ver 3.3.7)
    Custom HAL:                  NO
    Protobuf:                    build (3.5.1)
  NVIDIA CUDA:                   YES (ver 11.0, CUFFT CUBLAS FAST_MATH)
    NVIDIA GPU arch:             80
    NVIDIA PTX archs:
  cuDNN:                         YES (ver 8.0.5)
  OpenCL:                        YES (no extra features)
    Include path:                /home/shrik/opencv/opencv-4.5.4/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load
  Python 3:
    Interpreter:                 /usr/bin/python3 (ver 3.7.3)
    Libraries:                   /usr/lib/x86_64-linux-gnu/libpython3.7m.so (ver 3.7.3)
    numpy:                       /usr/local/lib/python3.7/dist-packages/numpy/core/include (ver 1.21.4)
    install path:                /home/shrik/opencv/python/cv2/python-3.7
  Python (for build):            /usr/bin/python2.7
  Java:
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO
  Install to:                    /home/shrik/opencv/install
-----------------------------------------------------------------
Detailed description
I am seeing strange results for optical-flow from cv::cuda::NvidiaOpticalFlow_2_0  compared to cv::cuda::FarnebackOpticalFlow. The flow vectors produced by cv::cuda::NvidiaOpticalFlow_2_0 are much larger than expected whereas cv::cuda::FarnebackOpticalFlow is producing meaningful vectors. I noticed this in some videos I was processing. Below I have provided a snippet to reproduce the problem with a pair from frames from this video.
Steps to reproduce
The files frame0.png and frame1.png are attached.
import numpy as np
import cv2
def max_rho(flow):
    flow = flow.download().astype(np.float32)
    fx,fy = np.split(flow,2,axis=2)
    f_rho = np.sqrt(fx*fx + fy*fy)
    max_rho = f_rho.max()
    print(max_rho)
frame0 = cv2.imread('frame0.png')
frame1 = cv2.imread('frame1.png')
frame0 = cv2.cvtColor(frame0,cv2.COLOR_BGR2GRAY)
frame1 = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY)
cuframe0 = cv2.cuda_GpuMat(frame0)
cuframe1 = cv2.cuda_GpuMat(frame1)
opt_flow_0 = cv2.cuda.FarnebackOpticalFlow_create()
flow_0 = opt_flow_0.calc(cuframe0,cuframe1,None)
max_rho(flow_0) # prints 8.191473
H,W = frame0.shape[:2]
params = {'perfPreset':cv2.cuda.NvidiaOpticalFlow_2_0_NV_OF_PERF_LEVEL_SLOW,
          'outputGridSize':cv2.cuda.NvidiaOpticalFlow_2_0_NV_OF_OUTPUT_VECTOR_GRID_SIZE_1} # Changing this param produces different results but they are still too large flow-vectors.
opt_flow_1 = cv2.cuda.NvidiaOpticalFlow_2_0_create((W,H),**params)
flow_1 = opt_flow_1.calc(cuframe0,cuframe1,None)
max_rho(flow_1[0]) # prints 303.37106 <- very large
 

Issue submission checklist
- [ x ] I report the issue, it's not a question
- [ x ] I checked the problem with documentation, FAQ, open issues, answers.opencv.org, Stack Overflow, etc and have not found solution
- [ x ] I updated to latest OpenCV version and the issue is still there
- [ x ] There is reproducer code and related data files: videos, images, onnx, etc
Hello @klshrinidhi, I have the exact same issue. Did you ever figure out what is the problem?
Hello @klshrinidhi, I have the exact same issue. Did you ever figure out what is the problem?
I did not unfortunately. Please let me know if you find anything. Thanks !!
Hello @klshrinidhi, I have the exact same issue. Did you ever figure out what is the problem?
I did not unfortunately. Please let me know if you find anything. Thanks !!
I'm trying to switch from using the CPU implementation cv2.optflow.createOptFlow_DualTVL1() to GPU one. In your expierience which GPU flow method best replicates the CPU method?
Hello @klshrinidhi, I have the exact same issue. Did you ever figure out what is the problem?
I did not unfortunately. Please let me know if you find anything. Thanks !!
I'm trying to switch from using the CPU implementation
cv2.optflow.createOptFlow_DualTVL1()to GPU one. In your expierience which GPU flow method best replicates the CPU method?
I didn't try other methods. Once I realized the problem I describe above, I moved on. 😆
Hi, in case anyone is still interested in the solution: The raw output of Nvidias optical flow estimator is in a 16-bit fixed point representation and has to be converted to the correct float values using the convertToFloat function of the estimator object.
So, in the example above, it would look something like this:
opt_flow_1 = cv2.cuda.NvidiaOpticalFlow_2_0_create((W,H),**params)
flow_1 = opt_flow_1.calc(cuframe0,cuframe1,None)
flow_1_float = opt_flow_1.convertToFloat(flow_1, None)
Hi, in case anyone is still interested in the solution: The raw output of Nvidias optical flow estimator is in a 16-bit fixed point representation and has to be converted to the correct float values using the
convertToFloatfunction of the estimator object.So, in the example above, it would look something like this:
opt_flow_1 = cv2.cuda.NvidiaOpticalFlow_2_0_create((W,H),**params) flow_1 = opt_flow_1.calc(cuframe0,cuframe1,None) flow_1_float = opt_flow_1.convertToFloat(flow_1, None)
This solve my (too large flow value) problem, thanks a lot!