pdf icon indicating copy to clipboard operation
pdf copied to clipboard

where is isSameSentence() ?

Open CZHIC opened this issue 3 years ago • 3 comments

CZHIC avatar Jul 08 '21 02:07 CZHIC

I have the same question too!

patrickxchong avatar Aug 13 '21 10:08 patrickxchong

I'm not sure what's the actual intended behaviour, but this worked for me at some level (although I ended up manually parsing the text output of GetTextByRow instead)

func isSameSentence(text pdf.Text, lastTextStyle pdf.Text) bool {
	return (text.Font == lastTextStyle.Font) && (text.FontSize == lastTextStyle.FontSize) && (text.X == lastTextStyle.X)
}

patrickxchong avatar Aug 15 '21 15:08 patrickxchong

For future visitors, the above isSameSentence isn't quite on the mark. The above definition prints the font, font-size, x, and y coords of each character of text in the PDF.

It might be useful to say that something is of the same sentence if it has the same font and font-size. In which case the function definition you'd want would be

func isSameSentence(text pdf.Text, lastTextStyle pdf.Text) bool {
	return (text.Font == lastTextStyle.Font) && (text.FontSize == lastTextStyle.FontSize)
}

That really isn't true to the definition of "sameSentence" here, so you may want to check to see if a period was present in lastTextStyle before return true and effectively adding on the character to the text that get's printed along-side it's text style.

func isSameSentence(text pdf.Text, lastTextStyle pdf.Text) bool {
	return (text.Font == lastTextStyle.Font) && (text.FontSize == lastTextStyle.FontSize) && strings.Contains(lastTextStyle, ".")
}

white0ut avatar Jan 15 '24 21:01 white0ut