marqo
marqo copied to clipboard
Nice error for multimodal field and non-multimodal field with the same fieldname [BUG]
Describe the bug A nice error to reject injecting multimodal fields and string fields into the same field
doc1 = {"image":"https://image.png/"}
doc2 = {"image": {"img": "https://some-thing/", "title": "dog walker" }}
mappings = "image": {"type":"multimodal"
"weights":{
"img":0.9,
"title":0.1
}
}
To Reproduce insert both the above docs into the same index (work with both orders). Receive an unhelpful error.
Expected behavior A helpful 4xx error for the problematic doc
Screenshots If applicable, add screenshots to help explain your problem.
The unhelpful error is happening in adding field properties in the backend, because the mapping being added to "my_field_1"
is set to normal text like
'my_field_1': {'type': 'text'}}
and then it looks for non-existent key properties
when trying to set it to a dict.
Because of this, I am weighing 2 possible solutions here, and I would like input on which is better @pandu-k @tomhamer @wanliAlex @jn2clark :
Solution 1. Let the bad mapping (field 1 is both dict and text) get to OpenSearch, then pass down the error from there: We can do this by using:
new_index_properties["my_field_1"]["properties"] = dict()
to bypass the key error caused by the non-existent new_index_properties["my_field_1"]["properties"]
Pros:
- Only documents affected by this bad mapping fail and return errors.
- Same as implementation as our other type mismatch errors, OS determines the error message.
- Returns a 400 Cons:
- Error message is not intuitively helpful, but it might be fine. It says:
{'errors': True,
'index_name': 'my-multimodal-index',
'items': [{'_id': '1',
'error': {'reason': 'object mapping for [__chunks.my_field_1] tried to '
'parse field [image] as object, but found a '
'concrete value',
'type': 'mapper_parsing_exception'},
'status': 400},
{'_id': '2', 'result': 'created', 'status': 201}],
'processingTimeMs': 10380.51678500051}
Solution 2. Catch the problem in backend -> add_customer_field_properties
and return our own error
We can simply do this by making sure there is no overlap between normal and multimodal mappings:
for field in customer_field_names:
if field in multimodal_combination_fields:
raise InvalidArgError()
Pros:
- We can make a match more helpful error message:
Status code: 400
Your add_documents call contains documents with field `my_field_1` as both text and multimodal object. A field may only be 1 of these. For more info on using multimodal indexing, see: https://marqo.pages.dev/0.0.17/#creating-and-searching-indexes-with-multimodal-combination-fields
- We can ensure no OS complications by rejecting the mistake on our end Cons:
- The docs would never reach OpenSearch, add docs call will fail
Feedback.
Option 2: Because Marqo is eventually consistent, we can't guarantee that the index info is up-to-date. For example, another user deletes the index and recreates it with different mappings.
We can still provide this nice error, but we need to also need to handle cases where we don't yet detect and an error, and have to propagate the Marqo-OS error (like option 1).
For Option 1, we can still translate the Marqo-os error into something more helpful (__chunks is an internal concept, that users may unfamiliar with). For example:
Field `image` is given as an object, but it is already defined as another type.
Note: ensure that the Marqo-os error parsing logic needs to be very resilient (catch all keyerrors and type errors while parsing), otherwise the parsing logic may result in an annoying-to-debug 5xx error.