pdfalto icon indicating copy to clipboard operation
pdfalto copied to clipboard

Option for generating extracted svg graphics only

Open kermitt2 opened this issue 2 years ago • 0 comments

By default pdfalto extracts both embedded bitmaps and vector graphics. The option -noImage avoids extracting both graphics types. However we might want still the vector graphics extracted and not the bitmap images, because bitmap image extraction can be time consuming and is often not really required by further processing (bitmap graphic objects with coordinates are present in the ALTO file even when the bitmap is not extracted), while svg files are necessary to further cluster the vector graphics.

Proposal: -noImage -> unchanged, avoid both type of graphics to be extracted -noBitmapImage -> avoid bitmap graphics to be extracted, but still extract vector graphics

kermitt2 avatar Jul 08 '21 12:07 kermitt2