amazon-textract-transformer-pipeline
amazon-textract-transformer-pipeline copied to clipboard
Post-process Amazon Textract results with Hugging Face transformer models for document understanding
Bumps [pdfjs-dist](https://github.com/mozilla/pdfjs-dist) from 3.4.120 to 4.2.67. Commits See full diff in compare view [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter...
When trying to run this cell, I got 'sndfile library not found' error. Even after I pip install the packages, the issue is still not resolved. Can anyone suggest how...
Recently heard from a user facing the following error on CDK deploy: ``` Resource handler returned message:"The runtime parameter of go1.x is no longer supported for creating or updating AWS...
Bumps [tar](https://github.com/isaacs/node-tar) from 6.1.13 to 6.2.1. Changelog Sourced from tar's changelog. Changelog 7.0 Rewrite in TypeScript, provide ESM and CommonJS hybrid interface Add tree-shake friendly exports, like import('tar/create') and import('tar/read-entry')...
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite) from 4.5.2 to 4.5.3. Changelog Sourced from vite's changelog. 4.5.3 (2024-03-24) fix: fs.deny with globs with directories (#16250) (96a7f3a), closes #16250 Commits aac695e release: v4.5.3 96a7f3a fix: fs.deny...
We're aware that the `amazon-textract-transformer-pipeline-assets` S3 bucket used by the "Launch stack" button on the [root README](https://github.com/aws-samples/amazon-textract-transformer-pipeline?tab=readme-ov-file#getting-started) is no longer publicly accessible, and working to find a resolution... In the...
LILT
I wanted to ask if this solution would currently support Language-Independent Layout Transformer - RoBERTa model (LiLT)? If not, I wanted to request that the inference code be updated to...
Today we demonstrate annotation and training for entity extraction only. For many use cases document classification is also important, and it should be pretty straightforward to support this too. A...
As of #26, users can train generative models to normalize entity text after extraction: For example to standardize date or currency formats, or correct common OCR error patterns. This is...
As of now the [custom online human review UI](https://github.com/aws-samples/amazon-textract-transformer-pipeline/blob/5415fb1befa900466d9f03ca098037f2db06b2b3/img/human-review-sample.png) is able to render detection bounding boxes over a full multi-page document at once, but the [training data annotation UI](https://github.com/aws-samples/amazon-textract-transformer-pipeline/blob/5415fb1befa900466d9f03ca098037f2db06b2b3/notebooks/img/smgt-custom-template-demo.png) is...