textract
textract copied to clipboard
Reconciling the textract MIT license with code components under *GPL
We're undergoing an internal software audit and identified at least one textract component released under the Affero GPL: the EbookLib.
Lawyers are getting a bit antsy over this. In general, compatibility with GPL means that code released under a different license (e.g. MIT) and combined with GPL'd code must be released under GPL. This might create a bad situation for textract.
We're getting around this by disabling all functionality related to EbookLib in our software, and uninstalling the EbookLib from the virtual environment so that it's never used or bundled with our software. If we required Epub capabilities in the future, and assuming nothing else changes in textract, we'd instruct users to install by themselves. We may also have a click-to-install dialog for the same purpose, but warning the users that THEY are downloading and installing.
Our situation: Apache license, code still private, soon to be released for open distribution. It relies on textract and other tools to do its job. We found the *GPL code during a self-audit, need to figure out what to do in general; so far none of our basic functionality is affected if we remove offending code/libraries/tools, so we can afford to skip. May not be the case in the future.
We like textract a lot, want to assist everyone in its users community from inadvertently getting in trouble due to EbookLib or other stuff with a nasty license. What do you guys think?
Thanks for opening up this as an issue, @pr3d4t0r! I'm glad to hear you are nearing distribution; that is very exciting indeed :)
As I mentioned in our twitter conversation, perhaps we can separately install the GPL packages as extra_requires
so that we can do things like pip install textract[GPL]
, which not only installs all of the non-GPL code (which will continue to happen by default), but will also install the GPL'ed code. This looks like a good StackOverflow for this. What would you think of this solution?
Also, how will your code distribution handle the unix commands that it requires? Are those GPL too?
@pr3d4t0r have you given this any additional thought? textract intrinsically relies not only on GPL python packages but also on various unix commands. I certainly want to be respectful of these licenses as we distribute textract, but if we don't need to complicate textract's distribution, that would obviously be great.
I just set up pyup
for monitoring dependencies on this project and they have a nifty tool for seeing the licenses of dependent projects. I thought I'd mention it here in case its helpful for framing the conversation.