elasticsearch-river-web indexing pdf content

Hi I have problem with indexing pdf files. It's seams that mime type is not recognized, because content of pdf file is not extracted. It just store file context like '%PDF-1.4 %�쏢 5 0 obj <> stream x��}K��nxf|��/� ....

Same results with xls, doc files

Could you help me please ? Thank you

Jan 05 '15 12:01 jirkaMat

Is there the file on internet? I'd like to reproduce the problem.

Jan 08 '15 01:01 marevol

Hi Yes, file is on internet for public access. http://www.csas.cz/static_internet/cs/Komunikace/Interni_komunikace/Informacni_kniha/Prilohy/TOP_Business_sdeleni_klientum.pdf But i think the problem is not in file. Did i undestand correctly, that river-web is indexing content of pdf directly or should i uses attachment plug-in ?

Jan 08 '15 07:01 jirkaMat