pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

added support for extraction of font boolean atrributes like bold and…

Open aloknayak29 opened this issue 7 years ago • 3 comments

Added support for extraction of font boolean atrributes like bold and italic (from textfontinfo class). Note that experiments revealed that these attributes will surely be True positive but can be false negative.

aloknayak29 avatar Nov 25 '18 18:11 aloknayak29

Thanks for PR - see detail comments in code for particular issues. If we are to add more font info from TextFontInfo, why not to add remaining :

GBool isFixedWidth() 
GBool isSerif() 
GBool isSymbolic() 

izderadicka avatar Nov 26 '18 10:11 izderadicka

Could you also elaborate bit on false negatives? When it happens? I actually use font name to check for bold ( in python)

izderadicka avatar Nov 26 '18 10:11 izderadicka

Thanks for the code reviews. I will update my repo soon. When false negative happens, checking on font name was not helpful. e.g in one case, I was getting 'helvetica' as output for all words regardless some of them were bold visually. flase negative happened for is_italic as well. I have less knowledge about significance of isFixedWidth(), isSerif(), isSymbolic() . If possible, Can you give me some references of their usage examples.

aloknayak29 avatar Nov 26 '18 11:11 aloknayak29