textract icon indicating copy to clipboard operation
textract copied to clipboard

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!

Results 74 textract issues
Sort by recently updated
recently updated
newest added

Bumps [got](https://github.com/sindresorhus/got) from 5.7.1 to 11.8.5. Release notes Sourced from got's releases. v11.8.5 Backport https://github.com/sindresorhus/got/commit/861ccd9ac2237df762a9e2beed7edd88c60782dc https://github.com/sindresorhus/got/compare/v11.8.4...v11.8.5 v11.8.3 Bump cacheable-request dependency (#1921) 9463bb6 Fix HTTPError missing .code property (#1739) 0e167b8 https://github.com/sindresorhus/got/compare/v11.8.2...v11.8.3...

dependencies

Access (doc | docx) (20 MB) have no reaction

Hi, It would be great if you will support text extraction for non-textual pdfs. (for example, scanned documents) - OCR. (In the same way you do for images). Thanks, Boaz

Bumps [marked](https://github.com/markedjs/marked) from 0.3.17 to 4.0.10. Release notes Sourced from marked's releases. v4.0.10 4.0.10 (2022-01-13) Bug Fixes security: fix redos vulnerabilities (8f80657) v4.0.9 4.0.9 (2022-01-06) Bug Fixes retain line breaks...

dependencies

I want to get picture data from '.doc' and '.pdf'. now I can only get text data ,is it possible to get picture data? any ideas, thank you!

Hi @dbashford A vulnerability has been reported on - cheerio-1.0.0-rc.2.tgz -> css-select-1.2.0.tgz -> nth-check-1.0.2.tgz nth-check is vulnerable to Inefficient Regular Expression Complexity https://www.whitesourcesoftware.com/vulnerability-database/CVE-2021-3803

Hi, A vulnerability has been reported on hosted-git-info The package hosted-git-info before 3.0.8 are vulnerable to Regular Expression Denial of Service (ReDoS) via regular expression shortcutMatch in the fromUrl function...

Hi @dbashford A vulnerability has been reported on - meow-3.7.0.tgz -> trim-newlines-1.0.0.tgz The trim-newlines package before 3.0.1 and 4.x before 4.0.1 for Node.js has an issue related to regular expression...

Hi, A vulnerability has been reported on xmldom xmldom is a pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module. xmldom versions 0.4.0 and older do...

Hi, A vulnerability has been reported on SheetJS and SheetJS Pro through 0.16.9 allows attackers to cause a denial of service (memory consumption) via a crafted .xlsx document that is...