gnfinder icon indicating copy to clipboard operation
gnfinder copied to clipboard

Internal Server Error

Open gdower opened this issue 1 year ago • 1 comments

I get an internal server error when using this URL:

http://dermestidae.wz.cz/wp-content/uploads/2024/09/Catalogue-Derodontidae-2024.pdf

at:

https://finder.globalnames.org

The PDF upload also errors, although copying and pasting in the text works.

gdower avatar Dec 05 '24 22:12 gdower

looks like the problem is with Apache Tika v1.2 service that we use. Something in this particular PDF breaks Tika.

There are several ways we can fix this problem. There are some new Go libraries that we should try for PDF conversion to text, and for text encodings normalization to UTF-8. If they perform well enough.

Also there is a newer version of Tika (v3.0.0), which might help, but in this case we have to rebuild Tika client, because new version's API is not compatible with one we currently use.

dimus avatar Dec 06 '24 15:12 dimus