pdf.js-extract
pdf.js-extract copied to clipboard
nodejs lib for extracting data from PDF files
Would be nice if the lib could extract a list of images (identifier/pathnames by pages) and give a way to extract some of them. Especially useful to see when a...
```ts import { PDFExtract, PDFExtractOptions } from 'pdf.js-extract'; const pdfExtract = new PDFExtract(); const options: PDFExtractOptions = { }; async function main() { const res = await pdfExtract.extract('https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf', options); }...
Basically the title, I am using typescript so maybe the types are not up to day but i don't see a way to get the color information for a text
Hi @ffalt thank you a lot for this project. I have successfully been using your `extractBuffer` function in a browser environment. Working with pdfjs-dist V4.0.269 I noticed that the y...
Hi, Is it possible to get coordinates of each word in the PDF. "Hello, world!" output is a chunk of words, I want to extract each word as one separate...
Just a proposal. Could probably be done better (async / promise). A quick way how to get access to attachments in pdfs.
Breaking change updates pdfjs to 4.1.392 which is now uses esm modules converted all code to use esm modules and updates tests tests all working
I use typescript. It works on next js right?