textract
textract copied to clipboard
Dependencies update
For the moment, pip complaints about these dependency conflicts:
textract 1.6.5 requires argcomplete~=1.10.0, but you have argcomplete 2.0.0 which is incompatible.
textract 1.6.5 requires beautifulsoup4~=4.8.0, but you have beautifulsoup4 4.11.1 which is incompatible.
textract 1.6.5 requires chardet==3.*, but you have chardet 4.0.0 which is incompatible.
textract 1.6.5 requires extract-msg<=0.29.*, but you have extract-msg 0.30.12 which is incompatible.
textract 1.6.5 requires pdfminer.six==20191110, but you have pdfminer-six 20220319 which is incompatible.
textract 1.6.5 requires six~=1.12.0, but you have six 1.16.0 which is incompatible.
textract 1.6.5 requires xlrd~=1.2.0, but you have xlrd 2.0.1 which is incompatible.
So I kindly request the dependencies of this awesome project to be updated wherever and whenever possible.
Just FYI, I manually the dependencies in a fork, and afaik, it works fine: https://github.com/deanmalmgren/textract/compare/master...Rafiot:master
I wouldn't necessary recommend it (which is why I didn't do a PR), as I'm not really competent to figure out if it breaks anything, but it seems to be ok with python 3.8+ in this project: https://github.com/pandora-analysis/pandora
Thank you, I'll try running your fork.
The extract-msg conflict seems to be because the module wants to retain support for Python 2.7, and 0.29.* is the last version that supports it. To use future versions, either support for Python 2 or use separate dependency lists for different python versions. Looking at the code they have now for extract-msg, it looks like it would still be perfectly functional.
Edit: Currently it looks like 2 conditions actually could cause errors, but it's not related to the module version. Both subject and body will return None
if those fields were not found at all in the file, and None
does not support being added to bytes. .msg
is not only used for Outlook messages, and they aren't always in the best format, so this could very well happen. In fact, this was reported as happening for some users.
Yeah, I couldn't really care less about python 2.7, so I'll not do anything about that personally, but if someone else wants to, feel free.