Unable to modify index search field names
The issue is related to using the automated Microsoft “Indexer” instead of ingesting the documents through the Admin UI in this application. When setting the field names of the new Index we need to modify some of the existing field names, but this seems to cause a mapping issue with the application. Despite updating the searchable names to match the index names, there seems to be a consistent error relating to needing “content_vector” or another field “chunk”. Are certain fields required for Azure Search for Langchain integration or are there considerations to make when changing index field names and mapping to the application?
In short: How can we change the application to search an index using custom field names?
This issue is for a:
- [x ] bug report
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Create an index using an indexer with unique field names and attempt to query the new index.
Modified Index (Custom Index)
Updated Searchable Fields (AzureSearchHelper.py)
ENV variables (Set/Default if not defined)
Any log messages given by the failure
Similar errors for “content_vector” and “chunk” based on various ENV variables attempted
azure.core.exceptions.HttpResponseError: () The request is invalid. Details: The property 'chunk' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type. Code: Message: The request is invalid. Details: The property 'chunk' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type. Exception in /api/conversation/custom
{ "error": "() The request is invalid. Details: The property 'chunk' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\nCode: \nMessage: The request is invalid. Details: The property 'chunk' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type." }
Expected/desired behavior
Changing the searchable fields should update and match the new index data structure/naming
OS and Version?
Not specific to OS
Versions
Not Specific
Mention any other details that might be useful
This new application was deployed by individually deploying each service separately and not using the ARM/Bicep template (One-Click Deploy). We have a previously working app using the one-click deploy, but after removing the ingestion code and replacing it with the indexer we have switched to a minimal service approach for only what the web application will use.
Thanks! We'll be in touch soon.
Hello @trevor-mcgirr , thank you for raising the issue, I have managed to reproduce it myself.
It appears that the error is caused by the fact that the langchain env vars such as AZURESEARCH_FIELDS_CONTENT_VECTOR affect both the main search index and the conversations index. Therefore when AZURESEARCH_FIELDS_CONTENT_VECTOR=vector, the application is expecting a vector field in the conversations-index, and fails because it cannot find it. It is the same for the other fields that have been renamed.
Ideally, it should be possible to specify different values for the fields across the two indexes, however for now I can think of two workarounds:
- Update the fields for the conversation logger to have similar field names to the search index
- If conversation logging is not required, disable it from the config
@trevor-mcgirr This is to inform you that we have done the analysis on this issue and got to know that if you want to change the field name of main search index then you need to change it in two places basically. one is in azure_search_helper.py file inside create_index function and other is in push_embedder.py file inside __convert_to_search_document function. Changing the field names in these two files will help you to update and match the new index data structure/naming. Please find below screenshot for your reference:
FYI @adamdougal @Roopan-Microsoft
@trevor-mcgirr @adamdougal Can you please check and confirm on above given solution. And please let us know if we can close this issue.
Can we make these configurable via env vars to save the user needing to modify code?
@adamdougal Sure. We will check and let you know soon.
FYI @Roopan-Microsoft
@adamdougal Yes, it is configurable via env vars. But for that, I need the list of fields that need to configure in env variables so that i can add it in bicep files also.
@Roopan-Microsoft
@trevor-mcgirr @adamdougal Can you please provide us the index search field names that need to be configure in environment variables?
Can we configure all field names?
@adamdougal Yes, we can configure all field names. But it will be good if you can provide the value of all field names that need to be configure in env vars.
@trevor-mcgirr @adamdougal Can you please provide us the value of all field names that need to be configure in env vars?
Heya, the values would be what are currently hardcoded. E.g. "id", "content", "content_vector" etc.
Heya, the values would be what are currently hardcoded. E.g. "id", "content", "content_vector" etc.
Thank you. We will make the changes and update you.
@trevor-mcgirr We have made the code changes and raised the PR which has been reviewed and approved by Adam. This issue is fixed so can we close this issue now?
Please find below PR link for your reference: https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/pull/1198
Hey @trevor-mcgirr - Closing this issue. Please feel to raise a new ticket if any other issue.