hdf5plugin icon indicating copy to clipboard operation
hdf5plugin copied to clipboard

Adding / importing JPEG filter

Open petoor opened this issue 4 years ago • 19 comments

Hi.

I've been looking at this lossy jpeg filter : https://github.com/CARS-UChicago/jpegHDF5 which i'd very much like to try. Is it possible to add it / import it with the hdf5plugin module?

Best regards Peter

petoor avatar Sep 11 '20 10:09 petoor

Hi,

It sounds possible to add it to hdf5plugin. The only issue I see is that jpegHDF5 seems to have no license, but that can hopefully be tackled by contacting the author.

If you feel like, Pull Request welcomed!

See https://github.com/silx-kit/hdf5plugin/blob/master/doc/contribute.rst and of course, we can provide support.

t20100 avatar Sep 11 '20 13:09 t20100

Besides the license, that should not be a big issue, the main issue I see is if we want to build ourselves the libjpeg library or if we want to rely on that of the system. System libjpeg libraries seem to use libjpeg-turbo that has plenty of compilation options for the different architectures and it would be a nightmare to try to solve at our side.

vasole avatar Sep 11 '20 14:09 vasole

Indeed, that can be a bit complicated. So far we've been embedding the source of the codec libs in the repository to ease installation from source.

For generating the wheels, libjpeg-turbo is in the manylinux docker, so it should be possible to make wheels which will embed the libjpeg. For building from source, either we need to embed the source of libjpeg or libjpeg-turbo and write the correct extension/lib in setup.py... or we leave this filter optional and requiring libjpeg already installed on the system.

t20100 avatar Sep 11 '20 15:09 t20100

I guess the simplest solution is to provide the reference old implementation with the possibility for the user to compile and link against the system libjpeg. That way the default implementation would be 2x or 6x slower but it would still work.

That filter seems to work only for 8-bit integers.

vasole avatar Sep 11 '20 15:09 vasole

I reached out to the author and he now added a licence to the file ( Apache 2 ) Is there a way to easily to link libjpeg to hdf5plugin? I don't mind giving this a go myself, but C is not really my strong suit.

petoor avatar Sep 13 '20 07:09 petoor

I have tried to incorporate the plugin into our hdf5plugin building chain.

It is straightforward if one can rely on an installed version of libjpeg-turbo.

We have to take a decision. After all, it is a filter just for uint8 data.

vasole avatar Sep 13 '20 13:09 vasole

Alright, that sounds good. I would love to check it out, how do i do that? uint8 data is used a lot in machine learning, especially computer vision. I think it would be useful for a lot of people to have a uint8 filter.

petoor avatar Sep 13 '20 13:09 petoor

The only thing you should need to use this filter is a shareable library or DLL with that filter in the directory pointed to by the HDF5_PLUGIN_PATH environment variable.

Here is an example where I have an existing HDF5 file that I saved with the JPEG plugin. I am using a very old version of h5dump (1.8.12), so it cannot possibly know about the JPEG filter.

(base) corvette:~/scratch>/usr/bin/h5dump --version
h5dump: Version 1.8.12

This is the contents of the HDF5 file:

(base) corvette:~/scratch>/usr/bin/h5dump --contents test_hdf5_mono_jpeg_q90_326.h5
HDF5 "test_hdf5_mono_jpeg_q90_326.h5" {
FILE_CONTENTS {
 group      /
 group      /entry
 group      /entry/data
 dataset    /entry/data/data
 group      /entry/instrument
 group      /entry/instrument/NDAttributes
 dataset    /entry/instrument/NDAttributes/AcquireTime
 dataset    /entry/instrument/NDAttributes/AttributesFileNative
 dataset    /entry/instrument/NDAttributes/AttributesFileParam
 dataset    /entry/instrument/NDAttributes/AttributesFileString
 dataset    /entry/instrument/NDAttributes/CameraManufacturer
 dataset    /entry/instrument/NDAttributes/CameraModel
 dataset    /entry/instrument/NDAttributes/E
 dataset    /entry/instrument/NDAttributes/Gettysburg
 dataset    /entry/instrument/NDAttributes/ID_Energy
 dataset    /entry/instrument/NDAttributes/ID_Energy_EGU
 dataset    /entry/instrument/NDAttributes/ImageCounter
 dataset    /entry/instrument/NDAttributes/MaxSizeX
 dataset    /entry/instrument/NDAttributes/MaxSizeY
 dataset    /entry/instrument/NDAttributes/NDArrayEpicsTSSec
 dataset    /entry/instrument/NDAttributes/NDArrayEpicsTSnSec
 dataset    /entry/instrument/NDAttributes/NDArrayTimeStamp
 dataset    /entry/instrument/NDAttributes/NDArrayUniqueId
 dataset    /entry/instrument/NDAttributes/Pi
 dataset    /entry/instrument/NDAttributes/RingCurrent
 dataset    /entry/instrument/NDAttributes/RingCurrent_EGU
 dataset    /entry/instrument/NDAttributes/Ten
 group      /entry/instrument/detector
 group      /entry/instrument/detector/NDAttributes
 dataset    /entry/instrument/detector/NDAttributes/ColorMode
 dataset    /entry/instrument/detector/data -> /entry/data/data
 group      /entry/instrument/performance
 dataset    /entry/instrument/performance/timestamp
 }
}

This is h5dump -p which shows the filter information.

(base) corvette:~/scratch>/usr/bin/h5dump -p -d /entry/data/data test_hdf5_mono_jpeg_q90_326.h5 | more
HDF5 "test_hdf5_mono_jpeg_q90_326.h5" {
DATASET "/entry/data/data" {
   DATATYPE  H5T_STD_U8LE
   DATASPACE  SIMPLE { ( 1024, 1024 ) / ( 1024, 1024 ) }
   STORAGE_LAYOUT {
      CHUNKED ( 1024, 1024 )
      SIZE 107505 (9.754:1 COMPRESSION)
   }
   FILTERS {
      USER_DEFINED_FILTER {
         FILTER_ID 32019
         COMMENT jpeg; see https://github.com/CARS-UChicago/jpegHDF5
         PARAMS { 90 1024 1024 0 }
      }
   }
   FILLVALUE {
      FILL_TIME H5D_FILL_TIME_IFSET
      VALUE  0
   }
   ALLOCATION_TIME {
      H5D_ALLOC_TIME_INCR
   }
   DATA {
   (0,0): 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204,
   (0,13): 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217,
   (0,26): 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230,
   (0,39): 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243,
   (0,52): 244, 245, 246, 247, 248, 250, 252, 248, 252, 253, 255, 253, 0, 1,
   (0,66): 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
   (0,84): 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
   (0,100): 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
   (0,116): 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
   (0,132): 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83,
   (0,148): 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,
   (0,164): 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112,
   (0,177): 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125,
   (0,190): 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138,
   (0,203): 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151,
   (0,216): 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164,
   (0,229): 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177,
   (0,242): 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190,
   (0,255): 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203,
   (0,268): 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216,
   (0,281): 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229,
   (0,294): 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242,
(base) corvette:~/scratch>

So it finds the JPEG plugin, prints the correct information, and decodes the data correctly.

This is 1024x1024 UInt8 image, so it would be about 1MB if it were not compressed. It was compressed with Quality=90, and this is the actual size:

(base) corvette:~/scratch>ls -lh test_hdf5_mono_jpeg_q90_326.h5
-rw-rw-r-- 1 epics domain users 182K May  8  2019 test_hdf5_mono_jpeg_q90_326.h5

This is my HDF5_PLUGIN_PATH

(base) corvette:~/scratch>echo $HDF5_PLUGIN_PATH
/home/epics/devel/areaDetector/ADSupport/lib/linux-x86_64

These are the jpeg files in that directory.

(base) corvette:~/scratch>ls -lh $HDF5_PLUGIN_PATH/*jpeg*
-r-xr-xr-x 1 epics domain users  14K Jul  7 12:11 /home/epics/devel/areaDetector/ADSupport/lib/linux-x86_64/libHDF5_jpeg_plugin.so
-r--r--r-- 1 epics domain users 386K Jul  7 12:10 /home/epics/devel/areaDetector/ADSupport/lib/linux-x86_64/libjpeg.a
-r-xr-xr-x 1 epics domain users 311K Jul  7 12:10 /home/epics/devel/areaDetector/ADSupport/lib/linux-x86_64/libjpeg.so

In my case I am building libjpeg from source and putting it that directory. I do that so I can ensure that libjpeg is available and a working version on all architectures that I build for. This includes 32 and 64-bit Windows, 32 and 64-bit Linux, MacOS, vxWorks and a number of others. It saves my users from needing to find and install the libjpeg library. But this is not necessary, it should also work fine with the system version of libjpeg.

MarkRivers avatar Sep 13 '20 14:09 MarkRivers

@MarkRivers, what sources of libjpeg are you using?

vasole avatar Sep 13 '20 14:09 vasole

@petoor

The branch jpeg https://github.com/silx-kit/hdf5plugin/tree/jpeg shows how things would look like to add the plugin using a static version of libjpeg-turbo as jpeg library. That code is just for illustration purposes and it is only for windows. I do not think we'll integrate the filter.

The main interest of hdf5plugin is that it allows to decouple the version of the HDF5 library used when building the plugin from the HDF5 version available when using the plugin. If the plugins supplied by Mark can be used on multiple versions of HDF5, then there is little interest on adding this filter to our list. You can just take the plugin from him.

vasole avatar Sep 13 '20 14:09 vasole

@MarkRivers, what sources of libjpeg are you using?

The repository where I build libjpeg is here: https://github.com/areaDetector/ADsupport

It builds the following libraries:

  • Bitshuffle and lz4
  • Blosc
  • CBF
  • GraphicsMagick
  • HDF5
  • JPEG
  • netCDF
  • NeXus
  • SZIP
  • TIFF
  • XML2
  • ZLIB

Each directory has a README.epics that says what version of the source code is used and any modifications made. The Makefile is always new, because the builds are done using the EPICS build system for OS-independence. In many cases minor changes were made to the source code to allow it to be built on vxWorks, etc.

When building areaDetector one can select whether to use the system version of any library, or to use the version built in ADSupport. The ADSupport versions have the following advantages:

  • Version is known to work with areaDetector plugins.
  • Additional operating systems are supported compared to the original version.
  • No need to involve system administrators in installing packages to allow areaDetector to be built .

MarkRivers avatar Sep 13 '20 15:09 MarkRivers

@MarkRivers Thank you.

I have seen you are using compatibility with JPEG version 9 when the default compatibility mode for libjpeg-turbo is 6.2

https://github.com/areaDetector/ADSupport/blob/4767b1afcaa676045d4bf9ee68a25448bb8a0b58/supportApp/jpegSrc/os/default/jpeglib.h#L40

Clearly if one wants to remain compatible with the source (you), only your sources have to be used.

vasole avatar Sep 13 '20 15:09 vasole

Clearly if one wants to remain compatible with the source (you), only your sources have to be used.

I am not sure that is true. I shared my JPEG plugin with the HDF Group, but not my version of libjpeg built from source. They added it to the HDF5 distribution, both the HDF5 source and plugin binaries, and they tested it. But they must have tested with some system version of libjpeg, because they did not use my libjpeg source. Maybe the API for the functions the plugin uses has not changed between 6.2 and 9?

MarkRivers avatar Sep 13 '20 15:09 MarkRivers

It could well be. Not being an expert I do not know if the incompatibilities can affect the output.

I have been able to build your plugin against libjpeg-turbo built with 6.2 compatibility mode (its default). However, unless you perform a systematic check to verify it, I would not take the risk to use libraries built with different compatibility settings.

The modern jpeg libraries can be built with different compatibility settings, perhaps it is enough that you specify your targeted compatibility.

vasole avatar Sep 13 '20 15:09 vasole

Maybe the API for the functions the plugin uses has not changed between 6.2 and 9?

6.2 and 9 are maintained by different organizations. 9 is maintained by the Independent JPEG group: https://www.ijg.org/. 6.2 appears to be a dead-end with no further development.

In October 2016 I updated ADSupport from JPEG 6.2 to 9b. However, at that time I did not need to make any changes to the areaDetector JPEG file writing plugin: https://github.com/areaDetector/ADCore/blob/master/ADApp/pluginSrc/NDFileJPEG.cpp. This tells me that the API did not change between 6.2 and 9b.

MarkRivers avatar Sep 13 '20 15:09 MarkRivers

If you do not use any extension mentioned in https://en.wikipedia.org/wiki/Libjpeg it should be fine.

vasole avatar Sep 13 '20 15:09 vasole

I managed to use the jpeg filter following your guide @MarkRivers . When calling it from the h5py wrapper it also compresses the file with the compression=32019 argument. Does anyone know how to call the filter with cd_values in order to change the compression quality?

In **hdf5plugin.jpeg i guess it would have been arguments, but i cant really find that in the jpeg branch.

petoor avatar Sep 14 '20 15:09 petoor

I added a commit (https://github.com/silx-kit/hdf5plugin/commit/777f14a0af5c9106089b6c3430be290fef53c3b3) to the jpeg branch with the handling of arguments.

t20100 avatar Sep 14 '20 15:09 t20100

Thank you Thomas.

It seems to be working :-) The images compressed with this filter are much smaller than being compressed with gzip (no wonder, it is a lossy compression). It makes the h5 format storage wise, competable with storing the raw jpeg files.

petoor avatar Sep 15 '20 09:09 petoor