Pamela Fox
Pamela Fox
Blocking: https://github.com/Azure/bicep-registry-modules/issues/4387
Thanks for the PR, I've discussed with @mattgotteiner. He says that the title should not be strictly required for integrated vectorization, but that some developers may want it. If we...
What TPM do you currently have for your deployment? Each question takes an average 1000 tokens, so it is easy to exceed the rate limits if your deployments have low...
We did look into this a bit, here's some relevant research on HTML vs plaintext vs markdown: https://arxiv.org/abs/2411.02959 https://arxiv.org/abs/2406.08100 The reason that we're currently picking HTML for tables is that...
Yeah, makes sense. This is the method that would need changing: DocumentAnalysisParser.table_to_html() in pdfparser.py You could put a table_to_csv() in there and try that instead. If your table are still...
Thanks for surfacing this. The current splitter attributes a section to the page where it starts, so when a chunk spans pages it can cite the wrong page. I have...
@elhele Can you specify what you mean by agentic chunking? The adjective "agentic" can have multiple interpretations these days.
Hm, I think that Document Intelligence already does a bit of that, and with our current splitting logic, it tries not to split things like tables and figures. It may...
I'm checking in with the Azure AI Search team about this, it's possible that an Azure AI Search SDK update would be needed.
Response from AI Search team: This is supported using the latest SDK versions (preview and GA). Here's how to use them with Python: [azure-search-vector-samples/demo-python/code/e2e-demos/azure-ai-search-e2e-build-demo.ipynb at main · Azure/azure-search-vector-samples (github.com)](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/e2e-demos/azure-ai-search-e2e-build-demo.ipynb) ....