paper-qa icon indicating copy to clipboard operation
paper-qa copied to clipboard

Add support to read docx files

Open taabishm2 opened this issue 1 year ago • 1 comments

Changes: - Implement parse_docx_to_text function to handle .doc and .docx files. - Use text chunking logic for splitting up docx text after parsing

Bug-fixes:
- Convert file names in `paperqa/readers.py` to lowercase before checking extensions to ensure case-insensitive matching.

taabishm2 avatar Sep 13 '24 23:09 taabishm2

Looking good. Added some comments, and can you make a unit test?

@jamesbraza Good points. Pushed some changes, please take a look

taabishm2 avatar Sep 14 '24 01:09 taabishm2

Thanks for your contribution! This is now stale - please open a new PR if you're still interested.

whitead avatar Oct 29 '24 19:10 whitead