elasticsearch-river-web icon indicating copy to clipboard operation
elasticsearch-river-web copied to clipboard

indexing pdf content

Open jirkaMat opened this issue 11 years ago • 2 comments

Hi I have problem with indexing pdf files. It's seams that mime type is not recognized, because content of pdf file is not extracted. It just store file context like '%PDF-1.4 %�쏢 5 0 obj <> stream x��}K���nxf|��/� ....

Same results with xls, doc files

Could you help me please ? Thank you

jirkaMat avatar Jan 05 '15 12:01 jirkaMat

Is there the file on internet? I'd like to reproduce the problem.

marevol avatar Jan 08 '15 01:01 marevol

Hi Yes, file is on internet for public access. http://www.csas.cz/static_internet/cs/Komunikace/Interni_komunikace/Informacni_kniha/Prilohy/TOP_Business_sdeleni_klientum.pdf But i think the problem is not in file. Did i undestand correctly, that river-web is indexing content of pdf directly or should i uses attachment plug-in ?

jirkaMat avatar Jan 08 '15 07:01 jirkaMat