opencv_contrib
opencv_contrib copied to clipboard
cv::cuda::NvidiaOpticalFlow_2_0 producing strange flow-vectors
System information
CPU
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 7
CPU MHz: 2200.156
BogoMIPS: 4400.31
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 39424K
NUMA node0 CPU(s): 0-11
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512_vnni md_clear arch_capabilities
GPU
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... Off | 00000000:00:04.0 Off | 0 |
| N/A 36C P0 49W / 400W | 0MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
OS
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 10 (buster)
Release: 10
Codename: buster
OpenCV Build Information
General configuration for OpenCV 4.5.4 =====================================
Version control: unknown
Extra modules:
Location (extra): /home/shrik/opencv/opencv_contrib-4.5.4/modules
Version control (extra): unknown
Platform:
Timestamp: 2021-11-07T22:46:51Z
Host: Linux 4.19.0-17-cloud-amd64 x86_64
CMake: 3.13.4
CMake generator: Unix Makefiles
CMake build tool: /usr/bin/make
Configuration: RELEASE
CPU/HW features:
Baseline: SSE SSE2 SSE3
requested: SSE3
Dispatched code generation: SSE4_1 SSE4_2 FP16 AVX AVX2 AVX512_SKX
requested: SSE4_1 SSE4_2 AVX FP16 AVX2 AVX512_SKX
SSE4_1 (17 files): + SSSE3 SSE4_1
SSE4_2 (2 files): + SSSE3 SSE4_1 POPCNT SSE4_2
FP16 (1 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 AVX
AVX (5 files): + SSSE3 SSE4_1 POPCNT SSE4_2 AVX
AVX2 (32 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2
AVX512_SKX (8 files): + SSSE3 SSE4_1 POPCNT SSE4_2 FP16 FMA3 AVX AVX2 AVX_512F AVX512_COMMON AVX512_SKX
C/C++:
Built as dynamic libs?: YES
C++ standard: 11
C++ Compiler: /usr/bin/c++ (ver 8.3.0)
C++ flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
C++ flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -fvisibility-inlines-hidden -g -O0 -DDEBUG -D_DEBUG
C Compiler: /usr/bin/cc
C flags (Release): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -O3 -DNDEBUG -DNDEBUG
C flags (Debug): -fsigned-char -ffast-math -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -Wno-long-long -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections -msse -msse2 -msse3 -fvisibility=hidden -g -O0 -DDEBUG -D_DEBUG
Linker flags (Release): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed
Linker flags (Debug): -Wl,--exclude-libs,libippicv.a -Wl,--exclude-libs,libippiw.a -Wl,--gc-sections -Wl,--as-needed
ccache: NO
Precompiled headers: NO
Extra dependencies: m pthread cudart_static dl rt nppc nppial nppicc nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/x86_64-linux-gnu
3rdparty dependencies:
OpenCV modules:
To be built: alphamat aruco barcode bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency sfm shape stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
Disabled: cudacodec world
Disabled by dependency: -
Unavailable: cvv java julia matlab ovis python2 viz
Applications: tests perf_tests apps
Documentation: NO
Non-free algorithms: YES
GUI: GTK3
GTK+: YES (ver 3.24.5)
GThread : YES (ver 2.58.3)
GtkGlExt: NO
OpenGL support: NO
VTK support: NO
Media I/O:
ZLib: /usr/lib/x86_64-linux-gnu/libz.so (ver 1.2.11)
JPEG: /usr/lib/x86_64-linux-gnu/libjpeg.so (ver 62)
WEBP: build (ver encoder: 0x020f)
PNG: /usr/lib/x86_64-linux-gnu/libpng.so (ver 1.6.36)
TIFF: /usr/lib/x86_64-linux-gnu/libtiff.so (ver 42 / 4.1.0)
JPEG 2000: build (ver 2.4.0)
OpenEXR: build (ver 2.3.0)
HDR: YES
SUNRASTER: YES
PXM: YES
PFM: YES
Video I/O:
DC1394: YES (2.2.5)
FFMPEG: YES
avcodec: YES (58.35.100)
avformat: YES (58.20.100)
avutil: YES (56.22.100)
swscale: YES (5.3.100)
avresample: YES (4.0.0)
GStreamer: YES (1.14.4)
v4l/v4l2: YES (linux/videodev2.h)
Parallel framework: TBB (ver 2018.0 interface 10006)
Trace: YES (with Intel ITT)
Other third-party libraries:
Intel IPP: 2020.0.0 Gold [2020.0.0]
at: /home/shrik/opencv/opencv-4.5.4/build/3rdparty/ippicv/ippicv_lnx/icv
Intel IPP IW: sources (2020.0.0)
at: /home/shrik/opencv/opencv-4.5.4/build/3rdparty/ippicv/ippicv_lnx/iw
VA: NO
Lapack: NO
Eigen: YES (ver 3.3.7)
Custom HAL: NO
Protobuf: build (3.5.1)
NVIDIA CUDA: YES (ver 11.0, CUFFT CUBLAS FAST_MATH)
NVIDIA GPU arch: 80
NVIDIA PTX archs:
cuDNN: YES (ver 8.0.5)
OpenCL: YES (no extra features)
Include path: /home/shrik/opencv/opencv-4.5.4/3rdparty/include/opencl/1.2
Link libraries: Dynamic load
Python 3:
Interpreter: /usr/bin/python3 (ver 3.7.3)
Libraries: /usr/lib/x86_64-linux-gnu/libpython3.7m.so (ver 3.7.3)
numpy: /usr/local/lib/python3.7/dist-packages/numpy/core/include (ver 1.21.4)
install path: /home/shrik/opencv/python/cv2/python-3.7
Python (for build): /usr/bin/python2.7
Java:
ant: NO
JNI: NO
Java wrappers: NO
Java tests: NO
Install to: /home/shrik/opencv/install
-----------------------------------------------------------------
Detailed description
I am seeing strange results for optical-flow from cv::cuda::NvidiaOpticalFlow_2_0 compared to cv::cuda::FarnebackOpticalFlow. The flow vectors produced by cv::cuda::NvidiaOpticalFlow_2_0 are much larger than expected whereas cv::cuda::FarnebackOpticalFlow is producing meaningful vectors. I noticed this in some videos I was processing. Below I have provided a snippet to reproduce the problem with a pair from frames from this video.
Steps to reproduce
The files frame0.png and frame1.png are attached.
import numpy as np
import cv2
def max_rho(flow):
flow = flow.download().astype(np.float32)
fx,fy = np.split(flow,2,axis=2)
f_rho = np.sqrt(fx*fx + fy*fy)
max_rho = f_rho.max()
print(max_rho)
frame0 = cv2.imread('frame0.png')
frame1 = cv2.imread('frame1.png')
frame0 = cv2.cvtColor(frame0,cv2.COLOR_BGR2GRAY)
frame1 = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY)
cuframe0 = cv2.cuda_GpuMat(frame0)
cuframe1 = cv2.cuda_GpuMat(frame1)
opt_flow_0 = cv2.cuda.FarnebackOpticalFlow_create()
flow_0 = opt_flow_0.calc(cuframe0,cuframe1,None)
max_rho(flow_0) # prints 8.191473
H,W = frame0.shape[:2]
params = {'perfPreset':cv2.cuda.NvidiaOpticalFlow_2_0_NV_OF_PERF_LEVEL_SLOW,
'outputGridSize':cv2.cuda.NvidiaOpticalFlow_2_0_NV_OF_OUTPUT_VECTOR_GRID_SIZE_1} # Changing this param produces different results but they are still too large flow-vectors.
opt_flow_1 = cv2.cuda.NvidiaOpticalFlow_2_0_create((W,H),**params)
flow_1 = opt_flow_1.calc(cuframe0,cuframe1,None)
max_rho(flow_1[0]) # prints 303.37106 <- very large

Issue submission checklist
- [ x ] I report the issue, it's not a question
- [ x ] I checked the problem with documentation, FAQ, open issues, answers.opencv.org, Stack Overflow, etc and have not found solution
- [ x ] I updated to latest OpenCV version and the issue is still there
- [ x ] There is reproducer code and related data files: videos, images, onnx, etc
Hello @klshrinidhi, I have the exact same issue. Did you ever figure out what is the problem?
Hello @klshrinidhi, I have the exact same issue. Did you ever figure out what is the problem?
I did not unfortunately. Please let me know if you find anything. Thanks !!
Hello @klshrinidhi, I have the exact same issue. Did you ever figure out what is the problem?
I did not unfortunately. Please let me know if you find anything. Thanks !!
I'm trying to switch from using the CPU implementation cv2.optflow.createOptFlow_DualTVL1() to GPU one. In your expierience which GPU flow method best replicates the CPU method?
Hello @klshrinidhi, I have the exact same issue. Did you ever figure out what is the problem?
I did not unfortunately. Please let me know if you find anything. Thanks !!
I'm trying to switch from using the CPU implementation
cv2.optflow.createOptFlow_DualTVL1()to GPU one. In your expierience which GPU flow method best replicates the CPU method?
I didn't try other methods. Once I realized the problem I describe above, I moved on. 😆
Hi, in case anyone is still interested in the solution: The raw output of Nvidias optical flow estimator is in a 16-bit fixed point representation and has to be converted to the correct float values using the convertToFloat function of the estimator object.
So, in the example above, it would look something like this:
opt_flow_1 = cv2.cuda.NvidiaOpticalFlow_2_0_create((W,H),**params)
flow_1 = opt_flow_1.calc(cuframe0,cuframe1,None)
flow_1_float = opt_flow_1.convertToFloat(flow_1, None)
Hi, in case anyone is still interested in the solution: The raw output of Nvidias optical flow estimator is in a 16-bit fixed point representation and has to be converted to the correct float values using the
convertToFloatfunction of the estimator object.So, in the example above, it would look something like this:
opt_flow_1 = cv2.cuda.NvidiaOpticalFlow_2_0_create((W,H),**params) flow_1 = opt_flow_1.calc(cuframe0,cuframe1,None) flow_1_float = opt_flow_1.convertToFloat(flow_1, None)
This solve my (too large flow value) problem, thanks a lot!