dask-elk
dask-elk copied to clipboard
Use dask to fetch data from Elasticsearch in parallel by sending the request to each shard separatelly.
dask-elk
Use dask to fetch data from Elasticsearch in parallel by sending the request to each shard separatelly.
Table of Contents
- Introduction
- Usage
Introduction
The library tries to imitate the functionality of the ES Hadoop plugin for spark. dask-elk
performs a parallel read across all the target indices shards.
In order to achieve that it uses Elasticsearch scrolling mechanism.
Usage
To use the library and read from an index:
from dask_elk.client import DaskElasticClient
# First create a client
client = DaskElasticClient() # localhost Elasticsearch
index = 'my-index'
df = client.read(index=index, doc_type='_doc')
You can even pass a query to push down to elasticsearch, so that any filtering can be done on the Elasticsearch side. Because dask-elk
uses scroll mechanism aggregations are not supported
from dask_elk.client import DaskElasticClient
# First create a client
client = DaskElasticClient() # localhost Elasticsearch
query = {
"query" : {
"term" : { "user" : "kimchy" }
}
}
index = 'my-index'
df = client.read(query=query, index=index, doc_type='_doc')
Read documentation here