jetson-utils icon indicating copy to clipboard operation
jetson-utils copied to clipboard

jetson.utils.videoSource.Capture() latency issue

Open GeneralErrorOK opened this issue 4 years ago • 12 comments

I'm building quite a big live/realtime object detection and image analysis application on a Jetson Xavier NX devboard and I'm bumping into quite a big performance issue. I have noticed this issue right from the start but we had other issues to deal with first, but now the application is mostly ready for production we have to address performance related issues as well.

The issue is: video FPS. We have developed the application with sample videos (h264 mp4 files) of 25fps but they mostly played at 10 - 17 fps. I was hoping this was an mp4 file related issue, which doesn't really matter for us as our application needs to source from RTSP but with RTSP the performance hit is even bigger. Some streams (1920x1080 h264 streams) play at 7 or 8 frames per second... All the while the CPU and GPU usage is very low, around 20 to 25% CPU and GPU next to nothing.

To troubleshoot I've tried to narrow the problem down by removing most of our own application code. I've used Python line_profiler to see where the bottleneck is and in our own application it was the videoSource.Capture() method that took around 90 tot 120ms per call.

I've also tried to profile the video-viewer.py example which showed the same problem:

Timer unit: 1e-06 s

Total time: 48.5157 s
File: video-viewer-profiled.py
Function: main at line 29

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    29                                           @profile
    30                                           def main():
    31                                           	# parse command line
    32         1         10.0     10.0      0.0  	parser = argparse.ArgumentParser(description="View various types of video streams", 
    33         1          8.0      8.0      0.0  									formatter_class=argparse.RawTextHelpFormatter, 
    34         1       4078.0   4078.0      0.0  									epilog=jetson.utils.videoSource.Usage() + jetson.utils.videoOutput.Usage() + jetson.utils.logUsage())
    35                                           
    36         1        235.0    235.0      0.0  	parser.add_argument("input_URI", type=str, help="URI of the input stream")
    37         1        211.0    211.0      0.0  	parser.add_argument("output_URI", type=str, default="", nargs='?', help="URI of the output stream")
    38                                           
    39         1          3.0      3.0      0.0  	try:
    40         1       3416.0   3416.0      0.0  		opt = parser.parse_known_args()[0]
    41                                           	except:
    42                                           		print("")
    43                                           		parser.print_help()
    44                                           		sys.exit(0)
    45                                           
    46                                           	# create video sources & outputs
    47         1    1609484.0 1609484.0      3.3  	input = jetson.utils.videoSource(opt.input_URI, argv=sys.argv)
    48         1      90103.0  90103.0      0.2  	output = jetson.utils.videoOutput(opt.output_URI, argv=sys.argv)
    49                                           
    50                                           	# capture frames until user exits
    51       339       2380.0      7.0      0.0  	while output.IsStreaming():
    52       339   44647334.0 131703.1     92.0  		image = input.Capture()
    53                                           		#print(image)
    54       338    2126203.0   6290.5      4.4  		output.Render(image)
    55       338      32276.0     95.5      0.1  		output.SetStatus("Video Viewer | {:d}x{:d} | {:.1f} FPS".format(image.width, image.height, output.GetFrameRate()))

You can see that with this video (an RTSP network stream with latency set to 500) the Capture() routine takes about 131ms on average... This equates to less than 8 frames per second video? This is very strange. The Xavier NX should be able to do multiple 25fps streams at once?

GeneralErrorOK avatar Sep 06 '21 11:09 GeneralErrorOK

To be sure this isn't an RTSP/network related issue, here is the profile result of the exact same video-viewer.py script, but with an MP4 videofile from local filesystem (on m2 SSD):

Timer unit: 1e-06 s

Total time: 36.611 s
File: video-viewer-profiled.py
Function: main at line 29

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    29                                           @profile
    30                                           def main():
    31                                           	# parse command line
    32         1          9.0      9.0      0.0  	parser = argparse.ArgumentParser(description="View various types of video streams", 
    33         1          5.0      5.0      0.0  									formatter_class=argparse.RawTextHelpFormatter, 
    34         1       3979.0   3979.0      0.0  									epilog=jetson.utils.videoSource.Usage() + jetson.utils.videoOutput.Usage() + jetson.utils.logUsage())
    35                                           
    36         1        220.0    220.0      0.0  	parser.add_argument("input_URI", type=str, help="URI of the input stream")
    37         1        269.0    269.0      0.0  	parser.add_argument("output_URI", type=str, default="", nargs='?', help="URI of the output stream")
    38                                           
    39         1          3.0      3.0      0.0  	try:
    40         1       3036.0   3036.0      0.0  		opt = parser.parse_known_args()[0]
    41                                           	except:
    42                                           		print("")
    43                                           		parser.print_help()
    44                                           		sys.exit(0)
    45                                           
    46                                           	# create video sources & outputs
    47         1     397717.0 397717.0      1.1  	input = jetson.utils.videoSource(opt.input_URI, argv=sys.argv)
    48         1      88798.0  88798.0      0.2  	output = jetson.utils.videoOutput(opt.output_URI, argv=sys.argv)
    49                                           
    50                                           	# capture frames until user exits
    51       361       3230.0      8.9      0.0  	while output.IsStreaming():
    52       361   34012743.0  94218.1     92.9  		image = input.Capture()
    53                                           		#print(image)
    54       361    2064049.0   5717.6      5.6  		output.Render(image)
    55       360      36947.0    102.6      0.1  		output.SetStatus("Video Viewer | {:d}x{:d} | {:.1f} FPS".format(image.width, image.height, output.GetFrameRate()))

It's a little faster but still a whopping 94ms per (blocking) call. This means the script runs at less than 11 frames/iterations per second which is unacceptable, really.

This is the full console output when running with a local mp4 file:

[gstreamer] initialized gstreamer, version 1.14.5.0
[gstreamer] gstDecoder -- creating decoder for ../../../../TopSecretApplication/Samplevideos/real-obs-video.mp4
Opening in BLOCKING MODE
Opening in BLOCKING MODE 
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
[gstreamer] gstDecoder -- discovered video resolution: 1920x1080  (framerate 30.000000 Hz)
[gstreamer] gstDecoder -- discovered video caps:  video/x-h264, stream-format=(string)byte-stream, alignment=(string)au, level=(string)4, profile=(string)high, width=(int)1920, height=(int)1080, framerate=(fraction)30/1, pixel-aspect-ratio=(fraction)1/1, interlace-mode=(string)progressive, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, parsed=(boolean)true
[gstreamer] gstDecoder -- pipeline string:
[gstreamer] filesrc location=../../../../Ninox/Samplevideos/real-obs-video.mp4 ! qtdemux ! queue ! h264parse ! omxh264dec ! video/x-raw ! appsink name=mysink
[video]  created gstDecoder from file:///media/willem/SSD/Projects/jetson-inference/utils/python/examples/../../../../TopSecretApplication/Samplevideos/real-obs-video.mp4
------------------------------------------------
gstDecoder video options:
------------------------------------------------
  -- URI: file:///media/willem/SSD/Projects/jetson-inference/utils/python/examples/../../../../TopSecretApplication/Samplevideos/real-obs-video.mp4
     - protocol:  file
     - location:  ../../../../TopSecretApplication/Samplevideos/real-obs-video.mp4
     - extension: mp4
  -- deviceType: file
  -- ioType:     input
  -- codec:      h264
  -- width:      1920
  -- height:     1080
  -- frameRate:  30.000000
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: none
  -- loop:       0
------------------------------------------------
[OpenGL] glDisplay -- X screen 0 resolution:  2560x1440
[OpenGL] glDisplay -- X window resolution:    2560x1440
[OpenGL] glDisplay -- display device initialized (2560x1440)
[video]  created glDisplay from display://0
------------------------------------------------
glDisplay video options:
------------------------------------------------
  -- URI: display://0
     - protocol:  display
     - location:  0
  -- deviceType: display
  -- ioType:     output
  -- codec:      raw
  -- width:      2560
  -- height:     1440
  -- frameRate:  0.000000
  -- bitRate:    0
  -- numBuffers: 4
  -- zeroCopy:   true
  -- flipMethod: none
  -- loop:       0
------------------------------------------------
[gstreamer] opening gstDecoder for streaming, transitioning pipeline to GST_STATE_PLAYING
[gstreamer] gstreamer changed state from NULL to READY ==> mysink
[gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1
[gstreamer] gstreamer changed state from NULL to READY ==> omxh264dec-omxh264dec0
[gstreamer] gstreamer changed state from NULL to READY ==> h264parse1
[gstreamer] gstreamer changed state from NULL to READY ==> queue0
[gstreamer] gstreamer changed state from NULL to READY ==> qtdemux1
[gstreamer] gstreamer changed state from NULL to READY ==> filesrc0
[gstreamer] gstreamer changed state from NULL to READY ==> pipeline0
[gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1
[gstreamer] gstreamer changed state from READY to PAUSED ==> omxh264dec-omxh264dec0
[gstreamer] gstreamer changed state from READY to PAUSED ==> h264parse1
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer changed state from READY to PAUSED ==> queue0
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer stream status CREATE ==> sink
[gstreamer] gstreamer changed state from READY to PAUSED ==> qtdemux1
[gstreamer] gstreamer changed state from READY to PAUSED ==> filesrc0
[gstreamer] gstreamer stream status ENTER ==> sink

(python3:12700): GStreamer-CRITICAL **: 13:55:06.597: gst_caps_is_empty: assertion 'GST_IS_CAPS (caps)' failed

(python3:12700): GStreamer-CRITICAL **: 13:55:06.597: gst_caps_truncate: assertion 'GST_IS_CAPS (caps)' failed

(python3:12700): GStreamer-CRITICAL **: 13:55:06.597: gst_caps_fixate: assertion 'GST_IS_CAPS (caps)' failed

(python3:12700): GStreamer-CRITICAL **: 13:55:06.598: gst_caps_get_structure: assertion 'GST_IS_CAPS (caps)' failed

(python3:12700): GStreamer-CRITICAL **: 13:55:06.598: gst_structure_get_string: assertion 'structure != NULL' failed

(python3:12700): GStreamer-CRITICAL **: 13:55:06.598: gst_mini_object_unref: assertion 'mini_object != NULL' failed
NvMMLiteOpen : Block : BlockType = 261 
NVMEDIA: Reading vendor.tegra.display-size : status: 6 
NvMMLiteBlockCreate : Block : BlockType = 261 
Allocating new output: 1920x1088 (x 11), ThumbnailMode = 0
OPENMAX: HandleNewStreamFormat: 3605: Send OMX_EventPortSettingsChanged: nFrameWidth = 1920, nFrameHeight = 1080 
[gstreamer] gstreamer message stream-start ==> pipeline0
[gstreamer] gstreamer stream status CREATE ==> src
[gstreamer] gstreamer stream status ENTER ==> src
[gstreamer] gstreamer message duration-changed ==> h264parse1
[gstreamer] gstDecoder -- onPreroll()
[gstreamer] gstreamer mysink taglist, video-codec=(string)"H.264\ /\ AVC", bitrate=(uint)2745140;
[gstreamer] gstreamer mysink taglist, encoder=(string)Lavf57.83.100, container-format=(string)"ISO\ MP4/M4A";
[gstreamer] gstreamer mysink taglist, video-codec=(string)"H.264\ \(High\ Profile\)", bitrate=(uint)2745140;
[gstreamer] gstDecoder recieve caps:  video/x-raw, format=(string)NV12, width=(int)1920, height=(int)1080, interlace-mode=(string)progressive, multiview-mode=(string)mono, multiview-flags=(GstVideoMultiviewFlagsSet)0:ffffffff:/right-view-first/left-flipped/left-flopped/right-flipped/right-flopped/half-aspect/mixed-mono, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)mpeg2, colorimetry=(string)bt709, framerate=(fraction)30/1
[gstreamer] gstDecoder -- recieved first frame, codec=h264 format=nv12 width=1920 height=1080 size=3110400
RingBuffer -- allocated 4 buffers (3110400 bytes each, 12441600 bytes total)
[gstreamer] gstreamer changed state from READY to PAUSED ==> mysink
[gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0
[gstreamer] gstreamer message async-done ==> pipeline0
[gstreamer] gstreamer message new-clock ==> pipeline0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> omxh264dec-omxh264dec0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> h264parse1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> queue0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> qtdemux1
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> filesrc0
[gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0
RingBuffer -- allocated 4 buffers (6220800 bytes each, 24883200 bytes total)
[OpenGL] glDisplay -- set the window size to 1920x1080
[OpenGL] creating 1920x1080 texture (GL_RGB8 format, 6220800 bytes)
[cuda]   registered openGL texture for interop access (1920x1080, GL_RGB8, 6220800 bytes)

GeneralErrorOK avatar Sep 06 '21 12:09 GeneralErrorOK

Hi @WillemvdKletersteeg, unfortunately without adding additional profiling to gstDecoder.cpp, it's unclear where the bottleneck is coming from.

What I will say, is when using CSI or V4L2 camera, I don't hear reports of such issue. Also for optimized RTSP implementation, you may want to look at DeepStream SDK which is optimized for multi-sensor streaming.

dusty-nv avatar Sep 07 '21 18:09 dusty-nv

Update on this topic - initial support for NVMM memory has been implemented in https://github.com/dusty-nv/jetson-utils/commit/35d8adec9a044ce3142ddc9089881c3b6898483f , which eliminates CPU memcpy.

For more info, see this forum thread: https://forums.developer.nvidia.com/t/jetson-utils-videosource-capture-latency-issue/188593/11?u=dusty_nv

dusty-nv avatar Sep 22 '21 19:09 dusty-nv

I still don't understand this issue! when I ran the RTSP stream using gst-launch worked out well, and latency was almost 0! but the real prob is that when I ran detectnet the detection was so bad and the latency was about 6 seconds??

any idea?

Yousef-Alhattab avatar Oct 03 '21 08:10 Yousef-Alhattab

Did you re-clone the repo and rebuild/reinstall from source?

Also run with --input-rtsp-latency=0

dusty-nv avatar Oct 03 '21 17:10 dusty-nv

Thanks a lot, Dusty!

I have made it, but I went through some problems: Firstly, it shows no lag at the beginning around 5 to 10 seconds, then its lag increases gradually. Secondly, the detection is bad. it almost detects nothing.

a thing to note is that my own camera has a lag of a second, but when I run RTSP stream by the following command:

gst-launch-1.0 rtspsrc location=rtsp://:@<IP>/axis-media/media.amp latency=0 ! decodebin max-size-time=30000000000 ! videoconvert ! autovideosink

it gives no latency at all

Yousef-Alhattab avatar Oct 04 '21 04:10 Yousef-Alhattab

is there any way to keep using the webcam and sending its stream with the detection it makes to a host machine?

Yousef-Alhattab avatar Oct 04 '21 08:10 Yousef-Alhattab

up to now,the latency issue still exist. Build after NVMM is disabled videoSource::Capture funt is long time... Please tell me how to solve it.My device is [email protected] @dusty-nv

Pi-Boss avatar Mar 24 '23 08:03 Pi-Boss

is there any way to keep using the webcam and sending its stream with the detection it makes to a host machine?

Yes, you can use RTP/RTSP or WebRTC from browser: https://github.com/dusty-nv/jetson-inference/blob/master/docs/webrtc-server.md

dusty-nv avatar Mar 24 '23 15:03 dusty-nv

up to now,the latency issue still exist. Build after NVMM is disabled videoSource::Capture funt is long time...

What kind of camera or video source are you using?

If you clone/compile the latest master, it uses sync=false which may help. Also you can set the timeout to capture if you need the function to not block.

dusty-nv avatar Mar 24 '23 15:03 dusty-nv

up to now,the latency issue still exist. Build after NVMM is disabled videoSource::Capture funt is long time...

What kind of camera or video source are you using?

If you clone/compile the latest master, it uses sync=false which may help. Also you can set the timeout to capture if you need the function to not block.

I tried to modify the default video codec from omx to v4l2 in videosource, and found that the capture method no longer blocks for a long time, which is very strange. Why is this happening?(cloned from commit 363148fb3558ae58c1b9b21ee2446de2de3355ae)

Pi-Boss avatar Mar 28 '23 06:03 Pi-Boss

I tried to modify the default video codec from omx to v4l2 in videosource, and found that the capture method no longer blocks for a long time, which is very strange. Why is this happening?

I'm not sure, but you probably had been using Capture(timeout=-1) (which blocks), or some other larger timeout, and there may be some difference in behavior with the OMX gstreamer elements vs the V4L2 ones.

dusty-nv avatar Mar 28 '23 13:03 dusty-nv