textract
textract copied to clipboard
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
Bumps [got](https://github.com/sindresorhus/got) from 5.7.1 to 11.8.5. Release notes Sourced from got's releases. v11.8.5 Backport https://github.com/sindresorhus/got/commit/861ccd9ac2237df762a9e2beed7edd88c60782dc https://github.com/sindresorhus/got/compare/v11.8.4...v11.8.5 v11.8.3 Bump cacheable-request dependency (#1921) 9463bb6 Fix HTTPError missing .code property (#1739) 0e167b8 https://github.com/sindresorhus/got/compare/v11.8.2...v11.8.3...
Access (doc | docx) (20 MB) have no reaction
Hi, It would be great if you will support text extraction for non-textual pdfs. (for example, scanned documents) - OCR. (In the same way you do for images). Thanks, Boaz
Bumps [marked](https://github.com/markedjs/marked) from 0.3.17 to 4.0.10. Release notes Sourced from marked's releases. v4.0.10 4.0.10 (2022-01-13) Bug Fixes security: fix redos vulnerabilities (8f80657) v4.0.9 4.0.9 (2022-01-06) Bug Fixes retain line breaks...
I want to get picture data from '.doc' and '.pdf'. now I can only get text data ,is it possible to get picture data? any ideas, thank you!
Hi @dbashford A vulnerability has been reported on - cheerio-1.0.0-rc.2.tgz -> css-select-1.2.0.tgz -> nth-check-1.0.2.tgz nth-check is vulnerable to Inefficient Regular Expression Complexity https://www.whitesourcesoftware.com/vulnerability-database/CVE-2021-3803
Hi, A vulnerability has been reported on hosted-git-info The package hosted-git-info before 3.0.8 are vulnerable to Regular Expression Denial of Service (ReDoS) via regular expression shortcutMatch in the fromUrl function...
Hi @dbashford A vulnerability has been reported on - meow-3.7.0.tgz -> trim-newlines-1.0.0.tgz The trim-newlines package before 3.0.1 and 4.x before 4.0.1 for Node.js has an issue related to regular expression...
Hi, A vulnerability has been reported on xmldom xmldom is a pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module. xmldom versions 0.4.0 and older do...
Hi, A vulnerability has been reported on SheetJS and SheetJS Pro through 0.16.9 allows attackers to cause a denial of service (memory consumption) via a crafted .xlsx document that is...