Performance optimizations for JPEG decompression
This PR reduces JPEG decompression time by roughly factor 2 for larger images.
First of all, we use the TurboJPEG library directly rather than through OpenCV's imdecode() method. TurboJPEG is part of libjpeg-turbo and thus installed on most systems anyway. Using TurboJPEG has two advantages:
- RGB/BGR conversions can be fused with the YUV to RGB conversion that is required during JPEG decompression anyway.
- TurboJPEG's API makes it easy to achieve zero-copy operation.
We remove two data copies: One during RGB/BGR conversion and one during the final conversion to sensor_msgs::Image.
On my system, this PR allows me to decode a 4K JPEG stream (encoded externally from ROS) at 30 Hz without a problem. Without the PR, I could at most achieve 20 Hz.
I tested mostly the stream configuration I had available. I'll do further RGB/BGR tests soon. Since there are no unit tests for compressed_image_transport (or I didn't find them), this requires careful attention.