fuzzywuzzy
fuzzywuzzy copied to clipboard
Propose to change the license to MIT
Hello,
We at Synthesized LTD really like your library and would like to use it in production. However, the license is too restrictive for us.
Do you mind changing it to more permissive such as MIT?
Cheers, Denis
This license change is not this simple. The library includes a copy of the Levenshtein implementation from https://github.com/ztane/python-Levenshtein, which is GPL licensed. As noted here: https://github.com/xdrop/fuzzywuzzy/issues/84 I maintain a complete rewrite for C++/Python, which could be ported to Java or wrapped using JNI if someone wants a MIT licensed implementation.
Thanks for the explanation @maxbachmann! I think we could help with a port/jni wrapper ;) It seems we can allocate one person to work on this topic.
I think we could help with a port/jni wrapper ;) It seems we can allocate one person to work on this topic.
Sounds good. You can find the C++ sources here and the Python wrapper here. I do not know whether a wrapper or a Java port is better for the performance. I assume a wrapper is less maintenance work. Someone started to write a JNI wrapper for java on android last year: https://github.com/MuntashirAkon/rapidfuzz-android/ and you can find some discussion regarding this here. However this seems abandoned and has a memory leak in pretty much every function.
Hey, I was wondering if there are any updates related to the licensing topic. Did anyone start work on the JNI wrapper?
I am not aware of anyone working on the JNI wrapper yet. I am not familiar with java myself, so this would need the help of someone familiar with Java + JNI to write the wrapper. I am willing to help with any questions. Especially on the following topics:
- questions regarding the C++ version
- questions regarding the Python wrapper
- help with code review of the JNI Wrapper
For the python wrapper the main complexities are the following topics:
- The python wrapper supports any iterable of hashable elements. This is still not optimal, since it does not handle hash collisions, but good enough for now. In case the JNI version only supports strings this should be fairly unproblematic
- I maintain a pure Python version on systems where the C++ version fails to compile. The implementations in Python are fairly simple, since Python supports arbitrary large integers, but it still requires some maintenance. This is not required for the JNI Wrapper
- The process module allows processing lists of hashable sequences, so users can e.g. compare one string to a list of strings. This is very helpful, since it allows a lot of optimizations like:
- usage of multiprocessing
- caching of common state
- use of SIMD
- reduction of calls into Python
Most of these topics in the process module should be relevant for the JNI Wrapper as well, but this is certainly the most complex part to implement. That said a basic implementation of them without these optimizations should be fairly simple and would probably be fast enough for the majority of users.