Inconsistency between Eland mapping and Elastic Mapping when appending to an index with `.` in the column names
Eland version: 7.14.1b1 Elasticsearch version: 7.15.1
Issue
If you have a Pandas Dataframe with the columns file.hash.sha256, event.id, process.name, label and do:
ed.pandas_to_eland(
df,
es_dest_index=index,
es_if_exists="append",
es_refresh=True,
use_pandas_index_for_es_ids=False
)
it will succeed the first time. However if you take the same dataframe with the same data and try to insert it a second time, you will get the following error:
File "/Users/sidhuas/protections-cloud/tools/artifacts/rapid-exception-list/rapid_exception_list.py", line 389, in add_shas_to_rapid_exception_list
ed.pandas_to_eland(
File "/Users/sidhuas/.pyenv/versions/3.9.1/envs/cloudprotection/lib/python3.9/site-packages/eland/etl.py", line 179, in pandas_to_eland
verify_mapping_compatibility(
File "/Users/sidhuas/.pyenv/versions/3.9.1/envs/cloudprotection/lib/python3.9/site-packages/eland/field_mappings.py", line 921, in verify_mapping_compatibility
raise ValueError(
ValueError: DataFrame dtypes and Elasticsearch index mapping aren't compatible:
- 'event' is missing from DataFrame columns
- 'file' is missing from DataFrame columns
- 'process' is missing from DataFrame columns
- 'event.id' is missing from ES index mapping
- 'file.hash.sha256' is missing from ES index mapping
- 'process.name' is missing from ES index mapping
If you print out the eland index vs. the elastic index you get the following:
Eland:
{
"mappings":{
"properties":{
"file.hash.sha256":{
"type":"keyword"
},
"process.name":{
"type":"keyword"
},
"event.id":{
"type":"keyword"
},
"event.module":{
"type":"keyword"
},
"label":{
"type":"double"
}
}
}
}
Elastic (created when Eland appends for the first time):
{
"mappings":{
"properties":{
"event":{
"properties":{
"id":{
"type":"keyword"
}
}
},
"file":{
"properties":{
"hash":{
"properties":{
"sha256":{
"type":"keyword"
}
}
}
}
},
"label":{
"type":"double"
},
"process":{
"properties":{
"name":{
"type":"keyword"
}
}
}
}
}
}
This makes it hard to use Eland when using the Elastic Common Schema
Expected Behaviour
The data should be appended to the index without issue.
This looks like a bug to me, thanks for opening! Specifically I think we need to handle nested properties inside of eland.field_mappings.verify_mapping_compatibility().