evadb icon indicating copy to clipboard operation
evadb copied to clipboard

added fitz and unstructuredio pdf loader py evadb script

Open seansru opened this issue 2 years ago • 4 comments

seansru avatar Nov 04 '23 22:11 seansru

Hi, @Stru17 how should the PDFReader be used by the EvaDB?

xzdandy avatar Nov 06 '23 15:11 xzdandy

The idea is to replace add an additional PDF Reader backed by Unstructured IO -- not replace the default PDF reader.

jarulraj avatar Nov 06 '23 20:11 jarulraj

@Stru17 The output and structure of this UnstructuredIOPDFReader class should match that of the original PDFReader class. Currently, it is more of a Python script.

jarulraj avatar Nov 07 '23 15:11 jarulraj

I have provided an updated script, please take a look and provide some feedback on if I should further change/update it. I have also message the professor on Slack and would appreciate if I can get a response as soon as possible!

seansru avatar Nov 20 '23 22:11 seansru