amazon-textract-textractor
amazon-textract-textractor copied to clipboard
Trouble replicating markdown output
I tried out the code from this example: https://aws-samples.github.io/amazon-textract-textractor/notebooks/document_linearization_to_markdown_or_html.html#All-entities-can-be-linearized
The markdown output I'm getting is different from the above and is incorrect:
| CO.
| FILE
| DEPT.
| CLOCK
| NUMBER
|
|-----|------------------------------|---|---|---|
| ABC | 126543 123456 12345 00000000 | | | |
| | |
|----------------|-----------|
| Period ending: | 7/18/2008 |
| Pay date: | 7/25/2008 |
| | |
|----------|-----------------------|
| Federal: | 3. $25 Additional Tax |
| State: | 2 |
| Local: | 2 |
| Earnings
| rate
| hours
| this period
| year to date
|
|----------|-----------|-------|----------|-----------|
| Regular | 10.00 | 32.00 | 320.00 | 16,640.00 |
| Overtime | 15.00 | 1.00 | 15.00 | 780.00 |
| Holiday | 10.00 | 8.00 | 80.00 | 4,160.00 |
| Tuition | | | 37.43 | 1,946.80 |
| | Gross Pay | | $ 452.43 | 23,526.80 |
| | | |
|-----------------|-------------|---------------|
| Other Benefits and
Information | this period | total to date |
| Group Term Life | 0.51 | 27.00 |
| Loan Amt Paid | | 840.00 |
| Vac Hrs | | 40.00 |
| Sick Hrs | | 16.00 |
| Title | Operator | |
| | | | |
|------------|---------------------|---------|----------|
| Deductions | Statutory
Federal Income Tax | -40.60 | 2,111.20 |
| | Social Security Tax | -28.05 | 1,458.60 |
| | Medicare Tax | -6.56 | 341.12 |
| | NY State Income Tax | -8.43 | 438.36 |
| | NYC Income Tax | -5.94 | 308.88 |
| | NY SUI/SDI Tax | -0.60 | 31.20 |
| | Other
Bond | -5.00 | 100.00 |
| | 401(k) | -28.85 | 1,500.20 |
| | Stock Plan | -15.00 | 150.00 |
| | Life Insurance | -5.00 | 50.00 |
| | Loan | -30.00 | 150.00 |
| | Adjustment
Life Insurance | + 13.50 | |
| | Net Pay | $291.90 | |
| | |
|-----------------------|-------------|
| Payroll check number: | 0000000000 |
| Pay date: | 7/25/2008 |
| Social Security No. | 987-65-4321 |
| | | |
|--------------|-------------------------------------------|---------|
| Pay to the
order of: | JOHN STILES | |
| This amount: | TWO HUNDRED NINETY-ONE AND 90/100 DOLLARS | $291.90 |
This is my code:
import os
from PIL import Image
from textractor import Textractor
from textractor.visualizers.entitylist import EntityList
from textractor.data.constants import TextractFeatures
image = Image.open("stub1.jpg").convert("RGB")
extractor = Textractor(region_name="us-west-2")
document = extractor.analyze_document(
file_source=image,
features=[TextractFeatures.LAYOUT, TextractFeatures.TABLES, TextractFeatures.FORMS, TextractFeatures.SIGNATURES],
save_image=True
)
print(document.tables.to_markdown())
I'm using amazon-textract-textractor version 1.8.2 (latest)