pyvips icon indicating copy to clipboard operation
pyvips copied to clipboard

How to vectorize bitmaps or convert to grayscale?

Open MAJAQA opened this issue 3 years ago • 14 comments

As a senior QA engineer I need to inspect differences in high resolution monochrome tiff files. Currently, as a tech-time project, I am using the pyvips python library to compare large size monochrome tiff files (others libraries simply fail due to memory issues), creating a difference monochrome tiff file. If you now have to inspect ex. 2000 of those difference files, that is challenging and you never know which difference is the most severe. Now I want to be able to:

  • have a method to convert each 16x16pixel bitmap tile to an 8-bit grayscale tone (the number of black pixels in the 16x16 pixels tile is the 8-bit number that determines the grayscale tone). This will give a lower resolution tiff file that enables to see the amount of pixels as a Hue (grayscale tone), swiftly showing in ex. an explorer view which one(s) of the ex. 2000 difference files is the one having the most black pixels. I could not come up with a simple method to do this...
  • 'quantify' and get some statistics on the amount and the size of the differences found, instead of having to open and visually inspect the differences. Anyone experience in converting a bitmap (tiff) to vectors? svgload() exists in pyvips, but I need the opposite or ex. something that is able to find clusters of 'touching' black pixels and give the circumference of it. Ultimately having statistics on the clusters found: how many, what's the average circumference, what's the biggest circumference, ...

MAJAQA avatar Aug 12 '21 20:08 MAJAQA

Hello @MAJAQA,

  1. libvips uses 0 and 255 for mono images, so you just need to average each 16 x 16 block. In Python, it's small = big.shrink(16, 16).
  2. You could find the difference between the two images, then threshold and look for connected regions. Use label_regions to segment into connected areas, then hist_find_indexed to measure the size of each connected group.

jcupitt avatar Aug 12 '21 21:08 jcupitt

Hi @jcupitt,

  1. just amazing 1 line of code delivers what I need!
  2. thanks for the tip; will definitely try this out and report back (most likely after my holidays) if I get something working

MAJAQA avatar Aug 12 '21 21:08 MAJAQA

Hi John,

I’m back from holiday…

The shrink is working fine, thanks again!

Trying labelregions gives: @.***

I’m pretty new with python and livips, so could be I am not using the correct syntax ☹ or using labelregions is not possible on a banded image…: this is the piece of code I use:

def ImagePy_Compare(ref,new,difff):
print('Comparing ' + ref + ' with ', new, ' ...')

a = py.Image.new_from_file(ref, access="sequential")
b = py.Image.new_from_file(new, access="sequential")

# a != b makes an N-band image with 0/255 for false/true ... we have to OR the
# bands together to get a 1-band mask image which is true for pixels which
# differ in any band
mask = (a != b).bandbool("or")
mask = mask.invert()

# try to find regions in the difference file
regions = mask.labelregions(segments=True)

Hope you can help, Marc Janssen

MAJAQA avatar Sep 02 '21 14:09 MAJAQA

Seems the image (error message) somehow got lost in the mail reply..., so adding it here image

MAJAQA avatar Sep 02 '21 14:09 MAJAQA

Hi again, it should work. For two ordinary RGB jpeg images I see:

$ python3
Python 3.9.5 (default, May 11 2021, 08:20:37) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvips
>>> a = pyvips.Image.new_from_file("k2.jpg")
>>> b = pyvips.Image.new_from_file("k4.jpg")
>>> mask = (a != b).bandor()
>>> result = mask.labelregions(segments=True)
>>> result
[<pyvips.Image 1772x2719 int, 1 bands, multiband>, {'segments': 5}]
>>> 

So there are four exactly equal pixels.

jcupitt avatar Sep 02 '21 17:09 jcupitt

Maybe the reason is I 'have to' use python2? Using: vipshome = 'C:\vips-dev-w64-web-8.11.0\vips-dev-8.11\bin' os.environ['PATH'] = vipshome + ';' + os.environ['PATH'] to get pyvips running... The regression team (using SQUISH) did not move yet to python3; still need to check why?

MAJAQA avatar Sep 02 '21 19:09 MAJAQA

It should work with python2 as well. I had a go:

$ sudo apt install python2-dev
$ curl https://bootstrap.pypa.io/pip/2.7/get-pip.py --output get-pip.py
$ python2 get-pip.py
$ pip2 install pyvips
$ python2
Python 2.7.18 (default, Mar  9 2021, 11:09:26) 
[GCC 10.2.1 20210306] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyvips
>>> a = pyvips.Image.new_from_file("k2.jpg")
>>> b = pyvips.Image.new_from_file("k4.jpg")
>>> mask = (a != b).bandor()
>>> result = mask.labelregions(segments=True)
>>> result
[<pyvips.Image 1772x2719 int, 1 bands, multiband>, {u'segments': 5}]
>>> 

jcupitt avatar Sep 02 '21 19:09 jcupitt

Your right, just tried with 2 small jpeg images and it indeed works: R: [2021-09-02T21:30:29.781+0200] ('Comparing E:\maja\CompareTest\ref\bird1.jpg with ', 'E:\maja\CompareTest\new\bird1.jpg', ' ...') R: [2021-09-02T21:30:29.781+0200] <pyvips.Image 1200x742 int, 1 bands, multiband> R: [2021-09-02T21:30:29.781+0200] {u'segments': 54089}

so must be related to the 1-bit tiff image format then?

Other than that; you proposed to use hist_find_indexed()... I have no clue how to continue...?

MAJAQA avatar Sep 02 '21 19:09 MAJAQA

How large are your 1-bit TIFFs?

labelregions needs a uint32 for each pixel, so it will use a lot of memory for large images.

jcupitt avatar Sep 02 '21 21:09 jcupitt

Our tiff files are up to 10Gpix (100kpix V and H). That's why I ended up with pyvips to compare them.

MAJAQA avatar Sep 07 '21 05:09 MAJAQA

Sorry, labelregions on an image that size will need at least 40GB of RAM for the id array.

How about running labelregions on the 16 x 16 shrunk image? That should only need a few MB of memory.

jcupitt avatar Sep 07 '21 08:09 jcupitt

Hi John, was afraid you would answer something like that ;-) Also thought about using the shrunk image... will have a go and see if I can use that path. The whole idea of location 'clusters' (regions) is to have the information at the high res level though... eg. extreme case (does happen): if only 1 (difference) pixel is found in both, joining 16x16 tiles and those pixels are not 'touching', the shrunk image will take those together as a region I presume, but that is not correct... Another case: 4 hires pixels, not touching is not the same as 4 pixels touching (any form) within the 16x16 tile (or can even be split over 2 tiles too), though will give the same resulting tone in the shrunk pix... In the latter case, the 4 pixels touching should be considered as a more severe difference... going further: the bigger the region surface gets, typically the more severe the difference is and even the 'shape' of the region can indicate a different issue, so those are the things I would like to be able to trace automatically...

MAJAQA avatar Sep 07 '21 13:09 MAJAQA

Sorry, that's just the way the labelregions algorithm works. It's like a flood-fill from every pixel, searching for different pixels. You can imagine it needs to be able to have the entire image in memory at once, and it'll need to store quite a large number for each pixel.

You could cut the image into sections and label each separately I guess. You'd need to walk the edge of each section and then try to connect it to its neighbours.

jcupitt avatar Sep 07 '21 15:09 jcupitt

Was just thinking about a similar approach. Won't be that simple to implement... Maybe splitting it into fixed (sized) sections so that the labelregions can be used (acceptable memory usage) is a good start. Any tips welcome (unfortunately for me this is side-track project where I have limited resources)

MAJAQA avatar Sep 07 '21 16:09 MAJAQA