Pillow icon indicating copy to clipboard operation
Pillow copied to clipboard

Binary Raster Support (JBIG/JBG/JBIG2)

Open ReedGraff opened this issue 4 months ago • 8 comments

Would love to see JBIG/JBG/JBIG2 implemented as supported files. They are relatively simple lossless raster formats (https://en.wikipedia.org/wiki/JBIG), seems like a fun challenge.

There is already an opensource C library, made by a Cambridge Prof, that supports this as well (https://www.cl.cam.ac.uk/~mgk25/jbigkit/). Here is the patent information (http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=19498&ICS1=35&ICS2=40&ICS3=).

ReedGraff avatar Aug 25 '25 06:08 ReedGraff

seems like a fun challenge.

If you're interested in creating a pull request, feel free to do so.

Here is the patent information

Looking at https://en.wikipedia.org/wiki/JBIG, I see

Doubts about patent licence requirements for JBIG1 implementations by IBM, Mitsu­bishi and AT&T prevented the codec from being widely implemented in open-source software.[2] For example, as of 2012, none of the commonly used web browsers supported it.

If you think that the patents are no longer a concern, please elaborate.

If nothing else, that Wikipedia paragraph makes it sound like this is a more obscure format. Why do you feel it should be supported by Pillow?

Do you have sample image that can be added to our test suite and distributed under our license?

radarhere avatar Aug 25 '25 07:08 radarhere

Well, I figure the more supported formats the merrier, ImageMagick supports it as well (https://imagemagick.org/script/formats.php#:~:text=Joint%20Bi%2Dlevel%20Image%20experts%20Group%20file%20interchange%20format), and it would be useful to me and I imagine others.

I can get some images etc...

I can start working on a python port, do y'all prefer a DLL / SO port or a fully pythonic port?...

ReedGraff avatar Aug 26 '25 18:08 ReedGraff

https://github.com/python-pillow/Pillow/pull/1938#issue-157621822

The decoders are all in C, a fast but unsafe language. Python is significantly slower, but safe. This is a tradeoff that we should be able to make. (in either direction)

radarhere avatar Aug 26 '25 20:08 radarhere

Ok, well I'm honestly not sure how helpful I'm going to be here since I don't know C well enough to manipulate it into any specific format y'all might need, also given I'm not too familiar with how y'all's library is structured.

However, I went ahead and wrote the C-extension wrapper code for the library and got it compiling on my Windows & Linux machines, which is good enough for my use case, the README.md has an example of loading the bitmap from a jbg into PIL and then outputting to another format (would still be nice to have this natively tied into PIL, and not a separate library): https://github.com/ReedGraff/jbigkitpy

Additionally, I added some .jbg & .jbig files for reference in the assets/ folder, which you may distribute. Also attached here: assets.zip

For anything beyond this I might need someone else to pick this up.

ReedGraff avatar Aug 26 '25 23:08 ReedGraff

I have stumbled upon this as well as part of reading PDF files with JBIG2 image streams in pypdf. When I looked into it back then, this mostly is a lack of available implementations which aren't under a viral license. In general, having Pillow support JBIG2 would be nice.

The current workaround/solution for pypdf is to use jbig2decode by Artifex, subject to AGPL-3.0 and thus very viral. Doing some testing has shown that pdfimages from poppler-utils would generate PNG files which are "somehow better" (I have not compared the output directly, but running Tesseract OCR on the pdfimages files proved to generate much better results than on the jbig2decode files.)

https://www.cl.cam.ac.uk/~mgk25/jbigkit/

This seems to be limited to JBIG1? Additionally, it is subject to GPL-2.0-or-later, thus imposing a viral effect and making Pillow unusable in commercial applications.

If you think that the patents are no longer a concern, please elaborate.

As far as I am aware, the patents primarily covered the encoder parts and never really were an issue with the decoder part. Additionally, the known ones (see https://www.hlevkin.com/hlevkin/Standards/fcd14492.pdf) might be expired (https://redirect.github.com/agl/jbig2enc/issues/58), although my knowledge regarding patents is too limited to do any final judgement on this.

stefan6419846 avatar Aug 29 '25 18:08 stefan6419846

What are y'all thinking? Anyone open to working on this with more C experience? @stefan6419846 @radarhere

ReedGraff avatar Sep 19 '25 22:09 ReedGraff

I guess someone would already have indicated after starting to work on this and this is not trivial due to possibly requiring an implementation from scratch. Explicitly pinging users here most likely does not help with this. From my side, I neither have the necessary C experience nor the personal resources to do an actual implementation (otherwise I would already have looked into some alternative implementation for the libraries I maintain myself).

stefan6419846 avatar Sep 20 '25 08:09 stefan6419846

From a brief search, I found the ISO, IEC, and ITU-T jointly published JBIG2 specifications at least in 2001 and probably earlier. Except for unusual circumstances like prolonged delays in application processing, patents in the United States of America last 20 years at most. Therefore any patents that might possibly come into play ought to be expired here and presumably most other places that might have patents.

I would also like to elaborate on the use case. The ITU-T standardized JBIG and JBIG2 for use in fax standards, and specifically to bridge facsimile and digital communications. For faxes or scanned pages of text, bi-level images are perfect; making sure the text can be read and contrasted from the background is the main concern. Conceptually this is the same as the PBM image format. Of course JBIG and JBIG2 offer compression, and so the reason I sought to learn about the JBIG2 format is this: when scanning paper documents with a tool such as SANE, I can scan documents in grayscale. Afterwords converting to a bi-level format such as JBIG2 will not only greatly reduce file size on its own, but—because going from greyscale to monochrome is basically a form of "downsampling", for lack of a better term (can you tell I don't have experience with this stuff? 🙃)—this gives a compression tool a lot of freedom to reduce noise or pick the cutoff levels appropriately (a user might raise the threshold for a pixel to become "black" to forgive smudging for example). This perhaps could benefit compression even more, just as the tunable parameters in any lossy compression procedure can be chosen to attain trade-offs.

An appropriate monochrome image format is the "right tool for the job" for preserving scanned documents of text. PDF documents can (and do) contain JBIG2 images as a container format around these images, and Ghostscript, Poppler, and MuPDF all support it well in that case, at least for reading. The use of JBIG2 in PDFs is—if I recall correctly—prescribed by the ISO/IEC standards for PDF documents, so that's already a good reason for support to be available. I found my way to this issue (not having much of an idea what Pillow is—don't give me any hints!) because this seems to be what WeasyPrint—the HTML-to-PDF Python library and utility—uses to detect images and decide how to handle them.

Being able to create such PDF documents would be helpful. Consider, for example, a monochrome thermal printer that implements the Internet Printing Protocol and accepts PDF print jobs as is typical. For point-of-sale software or other applications designed with this equipment in mind, using a bi-level image format, perhaps encapsulated in the PDF, just makes sense. PDFs with JBIG2 are very appealing when adding a text overlay (perhaps made with OCR software, which might itself use Pillow), and superimposing it on the scanned image of a page. This way the analog content is effectively preserved but machine-readable information is there for typical convenience. When this is not important, JBIG2 can also be used inside a TIFF container.

In conclusion, the state-of-the-art of JBIG2 adoption isn't super pretty, but from a technical point of view it's just the right tool for its intended audience. It's prescribed and referenced by international standards and a good share of non-Pillow-using FLOSS applications are able to at least view these. Therefore I think it's quite deserving of implementation here.

I don't even know Python and it's doubtful I can do anything to help. My goal with this comment is to spell out the motive for the use of these formats and the sorts of users who are left hanging for now such as myself. Information on this GitHub issue hasn't done justice for this seemingly-little-known image type, so hopefully this will cause any actions taken on this matter to be informed.

jscott0 avatar Dec 09 '25 00:12 jscott0