sist2 icon indicating copy to clipboard operation
sist2 copied to clipboard

Getting error 'PDF stream Length incorrect' while trying to index pdf file

Open mviejo33 opened this issue 2 years ago • 2 comments

Device Information (please complete the following information): Docker

services:
  elasticsearch:
    image: elasticsearch:7.17.9
    restart: unless-stopped
    environment:
      - "discovery.type=single-node"
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
      - "ingest.geoip.downloader.enabled=false"
  sist2-admin:
    image: simon987/sist2:3.1.4-x64-linux
    restart: unless-stopped
    volumes:
      - ./sist2-admin-data/:/sist2-admin/
      - /:/host
    ports:
      - 4090:4090 # sist2
      - 8080:8080 # sist2-admin
    working_dir: /root/sist2-admin/
    entrypoint: python3 /root/sist2-admin/sist2_admin/app.py

Command with arguments

Describe the bug Getting error 'PDF stream Length incorrect' while trying to index pdf file

Steps To Reproduce Please be specific!

  1. Run indexing with PDF included, spanish and english

Screenshots

Additional context actual related logs:

2023-08-23 23:50:58 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] Starting parse job {c20e9c898c6fb55a03a132ef684b2a69}
2023-08-23 23:51:00 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: PDF stream Length incorrect
2023-08-23 23:51:03 [DEBUG /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: ... repeated 2 times...
2023-08-23 23:51:03 [WARNING /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: OCR Disabled in this build
2023-08-23 23:51:03 [WARNING /sist2-admin/data/gls/Titulos/114 SANTA MARTHA AUMENTO B.pdf] FZ: aborting process from uncaught error!

mviejo33 avatar Aug 29 '23 20:08 mviejo33

Hi, OCR is broken for 3.1.4, you can use an earlier/later version for now, sorry!

simon987 avatar Sep 16 '23 12:09 simon987

I tried with 3.1.3 with the same result

mviejo33 avatar Sep 20 '23 00:09 mviejo33