sist2
sist2 copied to clipboard
Scan fails on some PDFs - task doesn't termiante - kill command doesn't work
Device Information (please complete the following information):
- OS: Ubuntu 20.04
- Deployment: Docker simon987/sist2:x64-linux
- SIST2 Version: 3.2
- Elasticsearch Version (if relevant) : Elasticsearch version 7.17.9
Command with arguments [ADMIN ] Starting sist2 command with args ['/root/sist2', 'scan', '/nfs/', '--threads=6', '--thumbnail-quality=50', '--thumbnail-count=1', '--thumbnail-size=552', '--content-size=32768', '--output=/sist2-admin/scan-DBNAME-2023-08-01 06:49:51.139372.sist2', '--depth=-1', '--archive=recurse', '--mem-buffer=2000', '--incremental', '--name=DBNAME', '--treemap-threshold=0.0005', '--json-logs', '--very-verbose']
Describe the bug Indexing for a number of pdfs fails with the following output and it hangs up the entire scan.
2023-08-01 11:45:57 [DEBUG /nfs/file1.PDF] Starting parse job {4c3bcce60239465bf75f8e6c349ca285}
[ERROR ] corrupted size vs. prev_size while consolidating
2023-08-01 11:45:57 [DEBUG tpool.c] Child process terminated with status code 0
2023-08-01 11:45:57 [FATAL tpool.c] Child process crashed (Aborted).
The process was working on /nfs/file1.PDF
Please consider creating a bug report at https://github.com/simon987/sist2/issues !
sist2 is an open source project and relies on the collaboration of its users to diagnose and fix bugs.
2023-08-01 11:45:57 [DEBUG database.c] Opening database /sist2-admin/scan-DBNAME-2023-08-01 06:49:51.139372.sist2 (0)
2023-08-01 11:45:57 [DEBUG database.c] Opening database /dev/shm/sist2-ipc-11.sqlite (1)
[ERROR ] corrupted size vs. prev_size
2023-08-01 11:46:10 [DEBUG tpool.c] Child process terminated with status code 0
2023-08-01 11:46:10 [FATAL tpool.c] Child process crashed (Aborted).
The process was working on /nfs/file2.PDF
Please consider creating a bug report at https://github.com/simon987/sist2/issues !
sist2 is an open source project and relies on the collaboration of its users to diagnose and fix bugs.
2023-08-01 11:46:10 [DEBUG database.c] Opening database /sist2-admin/scan-DBNAME-2023-08-01 06:49:51.139372.sist2 (0)
2023-08-01 11:46:10 [DEBUG database.c] Opening database /dev/shm/sist2-ipc-11.sqlite (1)
2023-08-01 11:46:16 [DEBUG tpool.c] Waiting for worker threads to finish
Expected behavior That these files get skipped and the scan proceeds with the rest (40k files)
Actual Behavior Tasks stops but is still shown as a running task - Kill does not terminate it. Only a docker container restart helps.
Additional context files sent via email.