semantra
semantra copied to clipboard
Support Microsoft Office file formats
Most of the documents I would like to search are in ppt or pptx format (Powerpoints). Would be nice if Powerpoint and Word documents can be indexed, even without a preview option.
This will be an excellent feature to add.
Looking into Apache Tika for this via tika-python. It does require Java to be installed but seems robust and permissively licensed. Open to another solution that has fewer dependencies, but I haven't found a good one yet