mindsdb
mindsdb copied to clipboard
[Bug]: Checking if a handler is installed by importing it is unreliable
Short description of current behavior
When MindsDB starts up, it checks which handlers are installed by trying to import all handlers and catching any errors.
This is an unreliable check e.g. when a handler has optional dependencies which we use:
- the file handler depends on langchain
- langchain has an optional dependency on pypdf for opening pdf files
- the file handler will import successfully even if pypdf isn't installed, but will throw an error if the user tries to interract with a pdf file.
Video or screenshots
No response
Expected behavior
We should check whether or not an extra is actually installed. The problem is that pip doesn't provide a way to do this: https://github.com/pypa/packaging-problems/issues/215#issuecomment-504569821
The best way I can think of is to run pip install --dry-run mindsdb[handler]
and check whether any new packages are required. Unfortunately --dry-run still downloads packages - perhaps we could monitor output and kill pip as soon as it starts downloading something? (this would indicate that the extra is not installed)
Note about running pip from code: https://stackoverflow.com/a/50255019
How to reproduce the error
No response
Anything else?
No response
In this example if file handler can be imported then it can be used, so check is still reliable. Human readable error should be thrown if user try to add pdf and dont have langchain installed. But that is not related to file handler itself. Also we dont have any way in HTTP or mysql api to give user know that optional dependency is not installed. As a solution for future, may be good will be have a table in the gui for each handler with dependencies list and check marks aside of each dependency that show is it imported successfully or not.
In this example if file handler can be imported then it can be used This isn't always true though
If you do this:
pip install .[langchain]
pip uninstall wikipedia
python -m mindsdb
MindsDB will tell you that the langchain handler is available - but if a user tries to use the wikipedia functionality it will crash.
I have run into this issue organically, I just can't remember which handlers were involved. There is unfortunately no easy solution for this though
FWIW I've come up with a mostly-solution to this but just a proof of concept atm.
Basically you:
- Get a list of the currently installed packages and their versions with
pip freeze
- Read the requirements file of a desired handler and parse the required versions with the
packaging
package - Compare the installed and required versions of each package with
packaging
This only checks first-level dependencies, but I think it will be sufficient in 99% of cases. If the first-level depend is installed, then its depends should have been installed at the same time.
Its basically impossible to check the whole tree - johnnydep exists for this, but it is very very slow because it needs to download and in some cases build packages before it can work out dependencies.
Our only other option would be to compile the full tree of depends for each handler and bake it into our releases. But that seems overkill.