eland icon indicating copy to clipboard operation
eland copied to clipboard

pandas_to_eland - cannot write into index by alias name

Open akerfx opened this issue 11 months ago • 1 comments

Hi all,

I followed this tutorial to rollover an index without data streams.

If I go step-by-step with DevTools directly in Kibana, then works fine.

Based on my use case, I have my data in a pandas DataFrame and want to use the pandas_to_eland() function to ingest the data into an aliased index. It seems that pandas_to_eland() can't work with an aliased index.

I tried two eland versions: eland version - 8.15.3 eland version - 8.17.0 Elasticsearch Cloud version - 8.15.3

Steps to reproduce in DevTools

This manual flow in DevTools is working as expected (prerequisites for the step with eland)

PUT _ilm/policy/my-rollover-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "1GB",
            "max_age": "1m"
          }
        }
      }
    }
  }
}


PUT _index_template/my_template
{
  "index_patterns": [
    "my-index-*"
  ],
  "template": {
    "settings": {
      "index.lifecycle.name": "my-rollover-policy",
      "index.lifecycle.rollover_alias": "my-alias"
    }
  }
}

PUT /my-index-000001
{
  "aliases": {
    "my-alias": {
      "is_write_index": true
    }
  }
}

POST /my-alias/_doc
{
  "message": "Hello, World!",
  "@timestamp": "2024-01-01T12:00:00Z"
}

Error with pandas_to_eland()

After the index is configured and created with one document, I want to ingest a second document with pandas_to_eland:

import eland as ed
import pandas as pd

from elasticsearch import Elasticsearch
from elastic_transport import RequestsHttpNode

if __name__ == "__main__":

    es_client = Elasticsearch(
        cloud_id="ES_CLOUD_ID",
        api_key=("ES_API_ID", "ES_API_KEY"),
        node_class=RequestsHttpNode,
        request_timeout=60,
        max_retries=10,
        retry_on_timeout=True,
    )

    data = {
        "message": "Hello, World 2!",
        "@timestamp": "2025-01-01T12:00:00Z"
    }
    
    df = pd.DataFrame([data])

    print(df)

    ed.pandas_to_eland(
        pd_df=df,
        es_client=es_client,
        es_dest_index="my-alias",
        es_if_exists="append",
    )
Traceback (most recent call last):
  File "/fs075/sd19a/destech/devel/aklein/lib_cell_device_stats/trunk/src/eland_ingest_github.py", line 44, in <module>
    ed.pandas_to_eland(
  File "/fs075/sd19a/destech/devel/CAD_shared/python_environments/libs_statistics_tool/lib/python3.10/site-packages/eland/etl.py", line 180, in pandas_to_eland
    dest_mapping = es_client.indices.get_mapping(index=es_dest_index)[
  File "/fs075/sd19a/destech/devel/CAD_shared/python_environments/libs_statistics_tool/lib/python3.10/site-packages/elastic_transport/_response.py", line 186, in __getitem__
    return self.body[item]  # type: ignore[index]
KeyError: 'my-alias'

output of self.body of elastic_transport/_response.py, line 186

{'my-index-000002': {'mappings': {}}, 'my-index-000001': {'mappings': {'properties': {'@timestamp': {'type': 'date'}, 'message': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}}}}}

It seems that pandas_to_eland() expects the es_dest_index name my-alias in the response of self.body. The content of self.body shows that only the referenced alias indexes (my-index-000001, my-index-000002) are included and therefore a KeyError is raised for my-alias

Could anyone confirm that this behavior is wrong? Or advise me on what I'm doing wrong.

Thanks and best regards

akerfx avatar Jan 22 '25 12:01 akerfx

This does appear to be a bug, or at least a missing feature, thank you for the report.

pquentin avatar Jan 22 '25 16:01 pquentin