azure-search-openai-demo
azure-search-openai-demo copied to clipboard
Issue while trying to have integration vectorization enabled.
Hi @pamelafox , I have been trying to use Integrated vectorization but after the deployment only search index is getting created even after enabling "azd env set USE_FEATURE_INT_VECTORIZATION true" , Please help me I see that the code is capable of it but still this issue.
PFA-
My aim is very simple -
- Get the files directly uploaded in the blob and no need to run prepdocs.py 2, Have multi-format document supported.
It says that the index has 2,161 documents in it, so it did index something. Or was that from running prepdocs.py before? You should see logs from prepdocs.py that describe the process of setting up the integrated vectorization, please share those as well.
@pamelafox - I have a query on this topic, I was not sure to raise an issue for something I have questions about. So using this open thread. Please suggest if this is not
So I ran for a few days with all these options enabled until I discovered this option Integrated Vectorization
So I followed the documentation and enabled it. Regarding the quality of results, what difference can I expect when Integrated Vectorization is enabled and when it is not and I use the below options?
There are some differences between local prepdocs ingestion and integrated vectorization, specifically:
- Azure AI search doesn't use Document Intelligence for cracking. (It may use similar technology behind the scenes, but it may also differ).
- Azure AI search may have a slightly different text splitting algorithm. It currently doesn't take into account tokens, it just splits based on character count/sentence boundaries. It should be functionally the same for English text, but I wouldn't recommend for CJK languages at this time.
- Azure AI search doesn't currently note the page number, according to issues filed here.
If you do see lower quality due to the cracking or splitting algorithm, please write up your findings so that the search team may make improvements as necessary. Thanks!
Hi @pamelafox, I have currently integration vectorization enabled in my code and it is running fine, but as you mentioned I am not able to see page number in the index. Are you planning to implement it in the future by any chance and how this integration vectorization pipeline approach is better than previous approach.
That feature would need to be implemented in the Azure AI Search internal code, not in this repo itself. The Azure AI Search team does not have a public ETA for the feature, but are aware of the need for it.