semantra
semantra copied to clipboard
Import PDF files from a dir
@freedmand
Good job! Semantra runs smoothly on my linux PC!
I think the command options :
semantra [dir]
semantra [dir1] [dir2] [....]
which can import one or more dirs contain many PDF files are useful and helpful.
Agreed! This seems useful. I'm thinking the behavior that makes sense would be to recursively include .txt
and .pdf
files when you specify a directory. Do you also think that makes sense?
Of course! Import many files with various types including .txt .pdf in a dir is essentially beneficial for the experience of using semantra.
I think the ''Unstructured'' package in Langchain which can parse different types of file including .txt .pdf may be a good technical solution.
https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/unstructured_file.html
https://github.com/Unstructured-IO/unstructured