pdf.js-extract
pdf.js-extract copied to clipboard
Fix for Y coordinate with new version of pdfjs
Hi @ffalt thank you a lot for this project. I have successfully been using your extractBuffer
function in a browser environment.
Working with pdfjs-dist V4.0.269 I noticed that the y coordinate is slightly wrong. If you consider upgrading pdfjs I had success calculating the y coordinate in the following way:
page.getTextContent().then((content) => {
// Content contains lots of information about the text layout and styles, but we need only strings at the moment
pag.content = content.items.map((item) => {
const tx = Util.transform(viewport.transform, item.transform);
return {
x: tx[4],
y: tx[5] - item.height,
str: item.str,
dir: item.dir,
width: item.width,
height: item.height,
fontName: item.fontName
};
});
})
This would replace the block that you currently have here
I hope this will be of help