chat-with-your-data-solution-accelerator icon indicating copy to clipboard operation
chat-with-your-data-solution-accelerator copied to clipboard

Unable to modify index search field names

Open trevor-mcgirr opened this issue 1 year ago • 1 comments

The issue is related to using the automated Microsoft “Indexer” instead of ingesting the documents through the Admin UI in this application. When setting the field names of the new Index we need to modify some of the existing field names, but this seems to cause a mapping issue with the application. Despite updating the searchable names to match the index names, there seems to be a consistent error relating to needing “content_vector” or another field “chunk”. Are certain fields required for Azure Search for Langchain integration or are there considerations to make when changing index field names and mapping to the application?

In short: How can we change the application to search an index using custom field names?

This issue is for a:

- [x ] bug report 
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Create an index using an indexer with unique field names and attempt to query the new index.

Modified Index (Custom Index) Screenshot 2024-01-31 at 11 04 39 AM

Updated Searchable Fields (AzureSearchHelper.py) Screenshot 2024-01-31 at 11 09 14 AM

ENV variables (Set/Default if not defined) Screenshot 2024-01-31 at 11 15 26 AM

Any log messages given by the failure

Similar errors for “content_vector” and “chunk” based on various ENV variables attempted

azure.core.exceptions.HttpResponseError: () The request is invalid. Details: The property 'chunk' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type. Code: Message: The request is invalid. Details: The property 'chunk' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type. Exception in /api/conversation/custom

{ "error": "() The request is invalid. Details: The property 'chunk' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type.\nCode: \nMessage: The request is invalid. Details: The property 'chunk' does not exist on type 'search.documentFields'. Make sure to only use property names that are defined by the type." }

Expected/desired behavior

Changing the searchable fields should update and match the new index data structure/naming

OS and Version?

Not specific to OS

Versions

Not Specific

Mention any other details that might be useful

This new application was deployed by individually deploying each service separately and not using the ARM/Bicep template (One-Click Deploy). We have a previously working app using the one-click deploy, but after removing the ingestion code and replacing it with the indexer we have switched to a minimal service approach for only what the web application will use.


Thanks! We'll be in touch soon.

trevor-mcgirr avatar Jan 31 '24 19:01 trevor-mcgirr

Hello @trevor-mcgirr , thank you for raising the issue, I have managed to reproduce it myself.

It appears that the error is caused by the fact that the langchain env vars such as AZURESEARCH_FIELDS_CONTENT_VECTOR affect both the main search index and the conversations index. Therefore when AZURESEARCH_FIELDS_CONTENT_VECTOR=vector, the application is expecting a vector field in the conversations-index, and fails because it cannot find it. It is the same for the other fields that have been renamed.

Ideally, it should be possible to specify different values for the fields across the two indexes, however for now I can think of two workarounds:

  • Update the fields for the conversation logger to have similar field names to the search index
  • If conversation logging is not required, disable it from the config

cecheta avatar Feb 02 '24 15:02 cecheta

@trevor-mcgirr This is to inform you that we have done the analysis on this issue and got to know that if you want to change the field name of main search index then you need to change it in two places basically. one is in azure_search_helper.py file inside create_index function and other is in push_embedder.py file inside __convert_to_search_document function. Changing the field names in these two files will help you to update and match the new index data structure/naming. Please find below screenshot for your reference:

image image image image

FYI @adamdougal @Roopan-Microsoft

Prasanjeet-Microsoft avatar Jul 26 '24 10:07 Prasanjeet-Microsoft

@trevor-mcgirr @adamdougal Can you please check and confirm on above given solution. And please let us know if we can close this issue.

Prasanjeet-Microsoft avatar Jul 29 '24 07:07 Prasanjeet-Microsoft

Can we make these configurable via env vars to save the user needing to modify code?

adamdougal avatar Jul 29 '24 09:07 adamdougal

@adamdougal Sure. We will check and let you know soon.

FYI @Roopan-Microsoft

Prasanjeet-Microsoft avatar Jul 29 '24 09:07 Prasanjeet-Microsoft

@adamdougal Yes, it is configurable via env vars. But for that, I need the list of fields that need to configure in env variables so that i can add it in bicep files also.

@Roopan-Microsoft

Prasanjeet-Microsoft avatar Jul 29 '24 13:07 Prasanjeet-Microsoft

@trevor-mcgirr @adamdougal Can you please provide us the index search field names that need to be configure in environment variables?

Prasanjeet-Microsoft avatar Jul 30 '24 06:07 Prasanjeet-Microsoft

Can we configure all field names?

adamdougal avatar Jul 30 '24 09:07 adamdougal

@adamdougal Yes, we can configure all field names. But it will be good if you can provide the value of all field names that need to be configure in env vars.

Prasanjeet-Microsoft avatar Jul 30 '24 11:07 Prasanjeet-Microsoft

@trevor-mcgirr @adamdougal Can you please provide us the value of all field names that need to be configure in env vars?

Prasanjeet-Microsoft avatar Jul 30 '24 15:07 Prasanjeet-Microsoft

Heya, the values would be what are currently hardcoded. E.g. "id", "content", "content_vector" etc.

adamdougal avatar Jul 31 '24 12:07 adamdougal

Heya, the values would be what are currently hardcoded. E.g. "id", "content", "content_vector" etc.

Thank you. We will make the changes and update you.

Prasanjeet-Microsoft avatar Jul 31 '24 15:07 Prasanjeet-Microsoft

@trevor-mcgirr We have made the code changes and raised the PR which has been reviewed and approved by Adam. This issue is fixed so can we close this issue now?

Please find below PR link for your reference: https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/pull/1198

Prasanjeet-Microsoft avatar Aug 05 '24 04:08 Prasanjeet-Microsoft

Hey @trevor-mcgirr - Closing this issue. Please feel to raise a new ticket if any other issue.

Roopan-Microsoft avatar Aug 19 '24 16:08 Roopan-Microsoft