Fulltextsearch hangs with complex PDF due to Ghostscript bug in version 10.0.0
I know this is not related to Fulltextsearch but I thought to put the information in here to let people find a simple solution a little quicker: Ubuntu 23.04 serves with Ghostscript version 10.0.0. Also the Nextcloud docker image serves with GS 10.0.0.
During an indexing process, I noticed that it got stuck on a particular PDF file, and I found out that a simple text extraction via Ghostscript was hanging.
I went to the official GS site and downloaded Ghostscript 10.02.0 Source: https://ghostscript.com/releases/gsdnld.html
- uncompress it: (f.e. tar -xvf ghostscript-10.02.0.tar.gz)
- go into this folder
- sudo ./configure
- sudo make install
- restart Terminal
- Test with gs -v
If you get "cannot find -lXext" during the linker stage simply install
Under Ubuntu:
sudo apt-get install libxext-dev
Under Fedora
sudo dnf install libXext-devel
Arch Linux
sudo pacman -S libxext
And do the build again.
Find out where your gs is located with which gs and replace the binary.
I have built it on an Ubuntu 23.04 and copied the binary into the official Nextcloud docker image.
gs -version
GPL Ghostscript 10.02.0 (2023-09-13)
Copyright (C) 2023 Artifex Software, Inc. All rights reserved.
That solved my issue with a hanging index run.
May it help!
During an indexing process, I noticed that it got stuck on a particular PDF file, and I found out that a simple text extraction via Ghostscript was hanging.
I have a single PDF file that hangs too. It's complex and 23,404 pages long. No errors in FTS, and it just "hangs". How were you able to verify that it was gs causing the issue?
I'm tempted to go ahead and force an upgrade of GS like you did above. My server is Debian 12, which is also currently set to GS 10.00.0 in the repo. I'd kinda like to verify that's the issue first though.
I was inspecting the bug tracker of gs back then amd simply built the latest version and replaced the binary. Then it worked.
16 Apr 2024 03:38:18 John Patrick Hayden III @.***>:
During an indexing process, I noticed that it got stuck on a particular PDF file, and I found out that a simple text extraction via Ghostscript was hanging.
I have a single PDF file that hangs too. It's complex and 23,404 pages long. No errors in FTS, and it just "hangs". How were you able to verify that it was gs causing the issue?
I'm tempted to go ahead and force an upgrade of GS like you did above. My server is Debian 12, which is also currently set to GS 10.00.0 in the repo. I'd kinda like to verify that's the issue first though.
— Reply to this email directly, view it on GitHub[https://github.com/nextcloud/fulltextsearch/issues/798#issuecomment-2058081021], or unsubscribe[https://github.com/notifications/unsubscribe-auth/AAD5PQGSDBU4YUJW3BQIL5TY5R6IJAVCNFSM6AAAAAA5PDCNTSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJYGA4DCMBSGE]. You are receiving this because you authored the thread. [Tracking image][https://github.com/notifications/beacon/AAD5PQCHLOHU5IPF5AY6LFLY5R6IJA5CNFSM6AAAAAA5PDCNTSWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTT2VPJP2.gif]
Thank you! I have several complex files that just "hang" with no errors or indication of the problem. Trying your fix now.
EDIT: Reporting back. Several files that would "hang" have been indexed successfully. I probably have several more days of indexing until the process is complete due to # and size of files I have, but NOTICEABLE DIFFERENCE with GS 10.03.0!
THANK YOU!!!!!!