textract icon indicating copy to clipboard operation
textract copied to clipboard

Dependencies update

Open VBobCat opened this issue 2 years ago • 4 comments

For the moment, pip complaints about these dependency conflicts:

textract 1.6.5 requires argcomplete~=1.10.0, but you have argcomplete 2.0.0 which is incompatible.
textract 1.6.5 requires beautifulsoup4~=4.8.0, but you have beautifulsoup4 4.11.1 which is incompatible.
textract 1.6.5 requires chardet==3.*, but you have chardet 4.0.0 which is incompatible.
textract 1.6.5 requires extract-msg<=0.29.*, but you have extract-msg 0.30.12 which is incompatible.
textract 1.6.5 requires pdfminer.six==20191110, but you have pdfminer-six 20220319 which is incompatible.
textract 1.6.5 requires six~=1.12.0, but you have six 1.16.0 which is incompatible.
textract 1.6.5 requires xlrd~=1.2.0, but you have xlrd 2.0.1 which is incompatible.

So I kindly request the dependencies of this awesome project to be updated wherever and whenever possible.

VBobCat avatar May 06 '22 12:05 VBobCat

Just FYI, I manually the dependencies in a fork, and afaik, it works fine: https://github.com/deanmalmgren/textract/compare/master...Rafiot:master

I wouldn't necessary recommend it (which is why I didn't do a PR), as I'm not really competent to figure out if it breaks anything, but it seems to be ok with python 3.8+ in this project: https://github.com/pandora-analysis/pandora

Rafiot avatar May 28 '22 12:05 Rafiot

Thank you, I'll try running your fork.

VBobCat avatar Jun 09 '22 19:06 VBobCat

The extract-msg conflict seems to be because the module wants to retain support for Python 2.7, and 0.29.* is the last version that supports it. To use future versions, either support for Python 2 or use separate dependency lists for different python versions. Looking at the code they have now for extract-msg, it looks like it would still be perfectly functional.

Edit: Currently it looks like 2 conditions actually could cause errors, but it's not related to the module version. Both subject and body will return None if those fields were not found at all in the file, and None does not support being added to bytes. .msg is not only used for Outlook messages, and they aren't always in the best format, so this could very well happen. In fact, this was reported as happening for some users.

Yeah, I couldn't really care less about python 2.7, so I'll not do anything about that personally, but if someone else wants to, feel free.

Rafiot avatar Jun 16 '22 07:06 Rafiot