scrapy-elasticsearch
scrapy-elasticsearch copied to clipboard
Ability to specify ingest pipeline as a query parameter
I can't see how I can specify an ingest pipeline on the elasticsearch bulk request:
https://www.elastic.co/guide/en/elasticsearch/reference/5.2/ingest.html
Longer term fix would be to be able to pass these settings as variables from the scrapy settings.py
A shorter term fix would be letting everyone know in the documentation which methods can be overridden to generate this behaviour.
@dbuijs thanks for pointing it out, this feature is new to me. Do you mind updating the documentation to reflect this setting?
Read it from the settings.py is pretty easy change, I will need to familiar myself with it before making it happened.
I'm afraid I haven't got a complete workaround yet. I'll be experimenting with a few fixes and once I have one working I'll update this issue.
It's a parameter argument that needs to get appended to the index PUT request. The python bulk method has it: http://elasticsearch-py.readthedocs.io/en/master/api.html#elasticsearch.Elasticsearch.bulk
I'm guessing we can get it done by overriding the default elasticsearch.helpers.bulk method that gets called to include additional parameters.