chat-with-your-data-solution-accelerator icon indicating copy to clipboard operation
chat-with-your-data-solution-accelerator copied to clipboard

Support ingesting Excel and CSV files

Open adamdougal opened this issue 1 year ago • 7 comments

Motivation

To be able to query data in Excel and CSV file format

How would you feel if this feature request was implemented?

excel

Requirements

  • Support ingesting and embedding of excel files
  • Support ingesting and embedding of csv files
  • Ensure relevant data is outputed in a tabular format on the front end

Links

  • https://github.com/teterouge/AOAI-AzureSearch-Excel

Tasks

To be filled in by the engineer picking up the issue

  • [ ] Task 1
  • [ ] Task 2
  • [ ] ...

adamdougal avatar Apr 12 '24 09:04 adamdougal

@adamdougal Given that Azure Document Intelligence is used for some of the data loading, and that supports XLSX and PPT already, is there a reason why it's not already supported?

Link to documentation: https://learn.microsoft.com/en-gb/azure/ai-services/document-intelligence/concept-layout?view=doc-intel-4.0.0

Maybe it's not using an up-to-date preview?

ferrari-leo avatar May 15 '24 10:05 ferrari-leo

@ferrari-leo Heya, I'm not 100% familiar with the history so not sure if that only started to be supported recently. However, given it is now supported, the only thing stopping this, is the priority and time to implement. We always welcome contributions so if you need this feature and have time we'd love a PR :)

adamdougal avatar May 15 '24 12:05 adamdougal

@adamdougal I'll have a look at where there'll be free time! But my main point is that if the doc intelligence API already supports analysing the layout of an excel file, then why can I not drag an excel into the ingest data tab and have it processed? I can't pinpoint where in the code it distinguishes between an excel and a pdf to throw the error "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet files are not allowed." In theory there shouldn't need to be more time spent on developing this feature because it's inherent in the doc intelligent API

ferrari-leo avatar May 15 '24 13:05 ferrari-leo