azure-search-openai-demo icon indicating copy to clipboard operation
azure-search-openai-demo copied to clipboard

How to add additional meta data to the index from json files

Open svenfeld opened this issue 3 months ago • 1 comments

I have a data setup which contains the content insides pdfs and json files with addtional metadata (the jsons and pdfs share the same name). I want to use the json files as addtional meta data for the pdfs. My approach was to parse the json as additional fields to the index but did not succeed.

I looked into this https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/customization.md#other-approaches-to-improve-search-results

and tried making the changes to the searchmanager.py but could not manage to receive the result I wanted. I added additional 'SimpleFields' but these were null after running'prepdocs.py' https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/prepdocslib/searchmanager.py#L106

The json is structured like this.

{
    "title":"product",
    "isRelevant":true,
    "location":"en",
    "remarks":"comments",
    "thumbnailURL":"https://",
    "fileName":"file.pdf"
 }

Can someone help out on this to create the index correctly? Or suggest another approach that would add the json meta data to the pdf to improve the search?

svenfeld avatar Oct 31 '24 16:10 svenfeld