pdf.js-extract icon indicating copy to clipboard operation
pdf.js-extract copied to clipboard

Fix for Y coordinate with new version of pdfjs

Open MCMattia opened this issue 1 year ago • 4 comments

Hi @ffalt thank you a lot for this project. I have successfully been using your extractBuffer function in a browser environment.

Working with pdfjs-dist V4.0.269 I noticed that the y coordinate is slightly wrong. If you consider upgrading pdfjs I had success calculating the y coordinate in the following way:

page.getTextContent().then((content) => {
	// Content contains lots of information about the text layout and styles, but we need only strings at the moment
	pag.content = content.items.map((item) => {
		const tx = Util.transform(viewport.transform, item.transform);
		return {
			x: tx[4],
			y: tx[5] - item.height,
			str: item.str,
			dir: item.dir,
			width: item.width,
			height: item.height,
			fontName: item.fontName
		};
	});
})

This would replace the block that you currently have here

I hope this will be of help

MCMattia avatar Dec 22 '23 09:12 MCMattia