connectors
connectors copied to clipboard
ensure_content_index_mappings doesn't ensure mappings
Bug Description
ensure_content_index_mappings does nothing actually because connector setup sets some mappings to the index on creation. This issue makes id and other fields (code) rely on dynamic mapping, which may cause mapping problems.
For example, network_drive.py will probably type id as (signed) long, but it can index documents with unsigned long values, which will end up with document_parsing_exception in Elasticsearch.
[FMWK][04:24:31][ERROR]
[Connector id: XH_9rY8BHCAv43wWFIhC, index name: test, Sync job id: C63-rY8B_7UHF08UH-5L]
operation index failed, {'type': 'document_parsing_exception', 'reason': "[1:108] failed to parse field [id] of type [long] in document with id '12368218145543858784'.
Preview of field's value: '12368218145543858784'", 'caused_by': {'type': 'x_content_parse_exception', 'reason': '[1:128] Numeric value (12368218145543858784) out of range of long (-9223372036854775808 - 9223372036854775807)\n at ...
Reproducer
- Spin up Elastic stack 8.13.4.
- Go to Kibana > Search > Content > Connectors.
- Create a new connector with
Network drive. - Click
Create and attach an index named XXX. - Click
Convert connectoron the right. - Click
Generate API key. - Click
Edit configuration, fill in the settings, and clickSave configuration. - Run
elastic-ingest -c /path/to/config.yaml --debugwith the config given on the UI. - Click
Sync>Full Content.
elastic-ingestwill outputIndex test-ensure-content-index-mappings already has mappings, skipping mappings creation.elastic-ingestwill highly probably also outputdocument_parsing_exceptionlike the one above.
Mappings before the sync
{
"mappings": {
"dynamic": "true",
"dynamic_templates": [
{
"all_text_fields": {
"match_mapping_type": "string",
"mapping": {
"analyzer": "iq_text_base",
"fields": {
"delimiter": {
"analyzer": "iq_text_delimiter",
"type": "text",
"index_options": "freqs"
},
"joined": {
"search_analyzer": "q_text_bigram",
"analyzer": "i_text_bigram",
"type": "text",
"index_options": "freqs"
},
"prefix": {
"search_analyzer": "q_prefix",
"analyzer": "i_prefix",
"type": "text",
"index_options": "docs"
},
"enum": {
"ignore_above": 2048,
"type": "keyword"
},
"stem": {
"analyzer": "iq_text_stem",
"type": "text"
}
}
}
}
}
]
}
}
Mappings after the sync
{
"mappings": {
"dynamic": "true",
"dynamic_templates": [
{
"all_text_fields": {
"match_mapping_type": "string",
"mapping": {
"analyzer": "iq_text_base",
"fields": {
"delimiter": {
"analyzer": "iq_text_delimiter",
"type": "text",
"index_options": "freqs"
},
"joined": {
"search_analyzer": "q_text_bigram",
"analyzer": "i_text_bigram",
"type": "text",
"index_options": "freqs"
},
"prefix": {
"search_analyzer": "q_prefix",
"analyzer": "i_prefix",
"type": "text",
"index_options": "docs"
},
"enum": {
"ignore_above": 2048,
"type": "keyword"
},
"stem": {
"analyzer": "iq_text_stem",
"type": "text"
}
}
}
}
}
],
"properties": {
"_timestamp": {
"type": "date"
},
"created_at": {
"type": "date"
},
"id": {
"type": "long"
},
"path": {
"type": "text",
"fields": {
"delimiter": {
"type": "text",
"index_options": "freqs",
"analyzer": "iq_text_delimiter"
},
"enum": {
"type": "keyword",
"ignore_above": 2048
},
"joined": {
"type": "text",
"index_options": "freqs",
"analyzer": "i_text_bigram",
"search_analyzer": "q_text_bigram"
},
"prefix": {
"type": "text",
"index_options": "docs",
"analyzer": "i_prefix",
"search_analyzer": "q_prefix"
},
"stem": {
"type": "text",
"analyzer": "iq_text_stem"
}
},
"analyzer": "iq_text_base"
},
"size": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"delimiter": {
"type": "text",
"index_options": "freqs",
"analyzer": "iq_text_delimiter"
},
"enum": {
"type": "keyword",
"ignore_above": 2048
},
"joined": {
"type": "text",
"index_options": "freqs",
"analyzer": "i_text_bigram",
"search_analyzer": "q_text_bigram"
},
"prefix": {
"type": "text",
"index_options": "docs",
"analyzer": "i_prefix",
"search_analyzer": "q_prefix"
},
"stem": {
"type": "text",
"analyzer": "iq_text_stem"
}
},
"analyzer": "iq_text_base"
},
"type": {
"type": "text",
"fields": {
"delimiter": {
"type": "text",
"index_options": "freqs",
"analyzer": "iq_text_delimiter"
},
"enum": {
"type": "keyword",
"ignore_above": 2048
},
"joined": {
"type": "text",
"index_options": "freqs",
"analyzer": "i_text_bigram",
"search_analyzer": "q_text_bigram"
},
"prefix": {
"type": "text",
"index_options": "docs",
"analyzer": "i_prefix",
"search_analyzer": "q_prefix"
},
"stem": {
"type": "text",
"analyzer": "iq_text_stem"
}
},
"analyzer": "iq_text_base"
}
}
}
}
Expected behavior
ensure_content_index_mappings should add missing mappings.
Workaround
Recreate the index by not using the Connectors Configuration page, and re-run sync.
Environment
- OS: Elastic Cloud + Windows 11 10.0.22621 N/A Build 22621
- Browser: Chrome Version 125.0.6422.77 (Official Build) (64-bit)
- Version: docker.elastic.co/enterprise-search/elastic-connectors:8.13.4.0
Additional context
I will open a PR to fix address this issue.