FOSElasticaBundle
FOSElasticaBundle copied to clipboard
Attachment ingest issue
Hello, I am using ES 5.2.2 with ingest attachment plugin and I am trying to search through doc/pdf files.
Sending a file to index/documents/test?pipeline=attachment creates
"data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
"attachment": {
"content_type": "application/rtf",
"language": "ro",
"content": "Lorem ipsum dolor sit amet",
"content_length": 28
},
so I can search through attachment.content field, however
app/console fos:elastica:populate --no-reset
with this mapping
data:
type: attachment
path: full
fields:
name: { store: yes }
title: { store : yes }
date: { store : yes }
content : { term_vector: with_positions_offsets, store: yes }
only creates
"data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
without desired attachment/content fields.
Any hints on what am I doing wrong? Thanks
I'm having a similar issue, but I'm not sure if ElasticaBundle supports ingest-attachment vs mapper-attachments.
I just get the error No handler for type [attachment] declared on field [content]
I have never used attachments. Can you guys make a PR to resolve the issue?
I'm migrating to a newer ElasticSearch and facing similar problems. From my own research I think the new Ingest Attachment plugin works a bit differently. You first define a "pipeline", where you configure the attachment plugin to take the original document, read a base64-encoded file from one field and put a "parsed" representation (an object containg content, mime type, etc.) of it into another field. The parsed object looks like this:
{
"content_type": "application/rtf",
"language": "ro",
"content": "Lorem ipsum dolor sit amet",
"content_length": 28
}
So clearly the base64 encoded file & "parsed" result must be separate fields, because they have are different types. Another complication is that in order to actually use a pipeline, you must specify it as a query parameter (i.e. ?pipeline=my-custom-pipeline-that-parses-files
). I don't think there's a nice way of doing it in FosElasticaBundle, right?
Another gotcha is that pipelines are not supported with the update API. If new files are added to an existing entity, they won't even be processed by the pipeline.
So yeah... Not sure how to even approach this. I think files are just a special case and if this bundle ever supports this use case, it should instead support pipelines in general. For now I think I'm just gonna stick with the deprecated attachment mapper plugin.
Any news? Elastica.io now fully supports pipelines and the ingest attachment plugin.