gstreamer-imx icon indicating copy to clipboard operation
gstreamer-imx copied to clipboard

imxvpudec_jpeg consumes more CPU than jpegdec on i.MX8

Open NIKovachev opened this issue 2 years ago • 11 comments

Hello, imxvpudec_jpeg (83% CPU) consumes more CPU than jpegdec (67% CPU), what could be the reason? I was expecting the opposite.

imxvpudec_jpeg: PID USER PRI NI VIRT RES SHR S CPU%-MEM% TIME+ Command 413 root 20 0 339M 28652 10300 S 83.2 0.7 0:08.56 gst-launch-1.0 v4l2src device=/dev/video0 ! image/jpeg, width=2560, height=1440, framerate=30/1 ! imxvpud

jpegdec: PID USER PRI NI VIRT RES SHR S CPU%-MEM% TIME+ Command 419 root 20 0 318M 22540 6420 S 67.0 0.6 0:06.09 gst-launch-1.0 v4l2src device=/dev/video0 ! image/jpeg, width=2560, height=1440, framerate=30/1 ! jpegdec

NIKovachev avatar Jan 18 '23 21:01 NIKovachev

@dv1 any suggestions? anything wrong with my setup or this is expected behaviour. do you have any benchmarks?

NIKovachev avatar Jan 23 '23 08:01 NIKovachev

This could be because of cache issues. It is worth investigating though. What machine is this exactly? imx8m mini? imx8mq? imx8m plus?

dv1 avatar Jan 25 '23 13:01 dv1

it's i.MX8M Data Sheet: https://coral.ai/docs/dev-board/datasheet/#system-components

NIKovachev avatar Jan 25 '23 17:01 NIKovachev

hi @dv1 do we have any progress, did you manage to reproduce?

NIKovachev avatar Feb 20 '23 13:02 NIKovachev

I tried to replicate this, no luck so far. I attempted this with 2 USB webcams. The Logitech C920 showed slightly higher CPU% with jpegdec compared to imxvpudec_jpeg. The command line:

gst-launch-1.0 v4l2src device=/dev/video4 ! image/jpeg ! queue ! jpegdec ! fakesink sync=true

Replace jpegdec with imxvpudec_jpeg.

I ran this on an imx8mq EVK.

What camera did you use? And what versions of libimxvpuapi and gstreamer-imx are you using?

dv1 avatar Feb 21 '23 22:02 dv1

I tried to replicate this, no luck so far. I attempted this with 2 USB webcams. The Logitech C920 showed slightly higher CPU% with jpegdec compared to imxvpudec_jpeg. The command line:

gst-launch-1.0 v4l2src device=/dev/video4 ! image/jpeg ! queue ! jpegdec ! fakesink sync=true

Replace jpegdec with imxvpudec_jpeg.

I ran this on an imx8mq EVK.

What camera did you use? And what versions of libimxvpuapi and gstreamer-imx are you using?

The key is in the resolution. the higher the resolution is the bigger the performance degradation. The test case reported is with image/jpeg, width=2560, height=1440, framerate=30/1 but image/jpeg, width=3840, height=2160, framerate=30/1 is even worst.

NIKovachev avatar Feb 22 '23 15:02 NIKovachev

sorry I forgot to mention: libimxvpuapi - 2.2.2 gstreamer-imx - latest version, commit ebbc5d3 on Dec 10, 2022

NIKovachev avatar Feb 22 '23 15:02 NIKovachev

And what camera is this? Is it a USB camera? If so, what model? Or is it a camera that is connected through some other means?

dv1 avatar Feb 22 '23 15:02 dv1

It's 4k USB camera Hama c-900 pro: https://pl.hama.com/001399950000/hama-kamera-internetowa-c-900-pro-uhd-4k-usb-c

I'm looking for a way to convert 30fps 4k jpeg into RGB and then apply ML on the images.

NIKovachev avatar Feb 22 '23 15:02 NIKovachev

Hello @dv1 did you manage to reproduce the issue?

NIKovachev avatar Mar 07 '23 10:03 NIKovachev

@NIKovachev I finally got to check this out again.

Since I do not have that webcam, I did this instead:

I created a test 4K MJPEG file with this pipeline:

GST_DEBUG=2 gst-launch-1.0 videotestsrc num-buffers=600 ! videoconvert dither=0 ! "video/x-raw,width=3840,height=2160,format=I420,framerate=30/1" ! queue ! jpegenc quality=70 ! matroskamux ! filesink location=mjpeg-4k-test.mkv

Then I played this on the imx8mq EVK:

GST_DEBUG=2 gst-launch-1.0 filesrc location=mjpeg-4k-test.mkv ! matroskademux ! jpegparse ! imxvpudec_jpeg ! fakesink sync=true

I see CPU usage of about 10% in htop.

Then, with jpegdec:

GST_DEBUG=2 gst-launch-1.0 filesrc location=mjpeg-4k-test.mkv ! matroskademux ! jpegparse ! jpegdec ! fakesink sync=true

This saturates the thread - 100% CPU.

So, in your case, I suspect that it might be USB related, actually. I do not know if USB 3.0 suffers from the same CPU usage problem as USB 2.0 does (that is, the CPU has to parse the USB packets, which is costly when large 4K frames are sent through those packets).

If you can, produce the following:

  1. Create dot dumps by setting the GST_DEBUG_DUMP_DOT_DIR environment variable to /tmp/. Then, collect the .dot files in /tmp/, and attach them here.
  2. Run your pipeline with the GST_DEBUG environment variable set to 2,*imx*:5.

dv1 avatar Jun 28 '23 19:06 dv1