eland icon indicating copy to clipboard operation
eland copied to clipboard

Inconsistency between Eland mapping and Elastic Mapping when appending to an index with `.` in the column names

Open Ashton-Sidhu opened this issue 4 years ago • 1 comments

Eland version: 7.14.1b1 Elasticsearch version: 7.15.1

Issue

If you have a Pandas Dataframe with the columns file.hash.sha256, event.id, process.name, label and do:

ed.pandas_to_eland(
    df,
    es_dest_index=index,
    es_if_exists="append",
    es_refresh=True,
    use_pandas_index_for_es_ids=False
)

it will succeed the first time. However if you take the same dataframe with the same data and try to insert it a second time, you will get the following error:

 File "/Users/sidhuas/protections-cloud/tools/artifacts/rapid-exception-list/rapid_exception_list.py", line 389, in add_shas_to_rapid_exception_list
    ed.pandas_to_eland(
  File "/Users/sidhuas/.pyenv/versions/3.9.1/envs/cloudprotection/lib/python3.9/site-packages/eland/etl.py", line 179, in pandas_to_eland
    verify_mapping_compatibility(
  File "/Users/sidhuas/.pyenv/versions/3.9.1/envs/cloudprotection/lib/python3.9/site-packages/eland/field_mappings.py", line 921, in verify_mapping_compatibility
    raise ValueError(
ValueError: DataFrame dtypes and Elasticsearch index mapping aren't compatible:
- 'event' is missing from DataFrame columns
- 'file' is missing from DataFrame columns
- 'process' is missing from DataFrame columns
- 'event.id' is missing from ES index mapping
- 'file.hash.sha256' is missing from ES index mapping
- 'process.name' is missing from ES index mapping

If you print out the eland index vs. the elastic index you get the following:

Eland:

{
   "mappings":{
      "properties":{
         "file.hash.sha256":{
            "type":"keyword"
         },
         "process.name":{
            "type":"keyword"
         },
         "event.id":{
            "type":"keyword"
         },
         "event.module":{
            "type":"keyword"
         },
         "label":{
            "type":"double"
         }
      }
   }
}

Elastic (created when Eland appends for the first time):

{
   "mappings":{
      "properties":{
         "event":{
            "properties":{
               "id":{
                  "type":"keyword"
               }
            }
         },
         "file":{
            "properties":{
               "hash":{
                  "properties":{
                     "sha256":{
                        "type":"keyword"
                     }
                  }
               }
            }
         },
         "label":{
            "type":"double"
         },
         "process":{
            "properties":{
               "name":{
                  "type":"keyword"
               }
            }
         }
      }
   }
}

This makes it hard to use Eland when using the Elastic Common Schema

Expected Behaviour

The data should be appended to the index without issue.

Ashton-Sidhu avatar Dec 08 '21 05:12 Ashton-Sidhu

This looks like a bug to me, thanks for opening! Specifically I think we need to handle nested properties inside of eland.field_mappings.verify_mapping_compatibility().

sethmlarson avatar Dec 09 '21 20:12 sethmlarson