opensearchserver Missing pdf title

Hello,

I'm using a web crawler index. When it crawlers to a pdf document normally it extracts a title. For some documents, it did not extract a title. So If I go to render template an search for it. I cannot follow the link, because it's title based.

Could you please advise me, how to fix this. Is it a bug or a normal behaviour? If you need some example documents, I could provide you with some urls.

Tanks for the great work and for your previous supports.

May 03 '17 08:05 Mojster

Definitely interested by an example.

Currently we are using the PDFBox library to extract those informations. We may update the library (if required) or open an issue.

May 08 '17 21:05 emmanuel-keller

Example image with search results. pdf_example

Link to working pdf from image: http://home.izum.si/izum/e-prirocniki/5_COBISS3_Izposoja/Cel_5_COBISS3_Izposoja.pdf Links to pdf with no title: http://home.izum.si/cobiss/oz/HTML/OZ_2012_4_final/files/assets/common/downloads/publication.pdf http://home.izum.si/cobiss/OZ/HTML/OZ_2012_4_final/files/assets/common/downloads/publication.pdf

Hope this helps you further.

Could also be a problem with pdf. I'll continue with investigation on this part.

May 10 '17 08:05 Mojster

Found out, that if I open the file in Acrobat Reader and go to File->Properties, there's a title field. If it's empty than normally PDFBox couldn't extract it.

I'm closing the issue because this is a mistake of the PDF issuer.

May 10 '17 10:05 Mojster

I'd like to reopen this. If there is no title in the PDF file there should be a fallback. E.g. use the URL as title. Otherwise the user can't click the result.

Jan 22 '18 22:01 Marx1st

I agree with you. So I've reopened it.

Feb 16 '18 09:02 Mojster

opensearchserver opensearchserver copied to clipboard

Missing pdf title

opensearchserver
opensearchserver copied to clipboard