paper-qa
paper-qa copied to clipboard
Add support to read docx files
Changes:
- Implement parse_docx_to_text function to handle .doc and .docx files.
- Use text chunking logic for splitting up docx text after parsing
Bug-fixes:
- Convert file names in `paperqa/readers.py` to lowercase before checking extensions to ensure case-insensitive matching.
Looking good. Added some comments, and can you make a unit test?
@jamesbraza Good points. Pushed some changes, please take a look
Thanks for your contribution! This is now stale - please open a new PR if you're still interested.