mindsdb icon indicating copy to clipboard operation
mindsdb copied to clipboard

[Bug]: Checking if a handler is installed by importing it is unreliable

Open hamishfagg opened this issue 7 months ago • 3 comments

Short description of current behavior

When MindsDB starts up, it checks which handlers are installed by trying to import all handlers and catching any errors.

This is an unreliable check e.g. when a handler has optional dependencies which we use:

  • the file handler depends on langchain
  • langchain has an optional dependency on pypdf for opening pdf files
  • the file handler will import successfully even if pypdf isn't installed, but will throw an error if the user tries to interract with a pdf file.

Video or screenshots

No response

Expected behavior

We should check whether or not an extra is actually installed. The problem is that pip doesn't provide a way to do this: https://github.com/pypa/packaging-problems/issues/215#issuecomment-504569821

The best way I can think of is to run pip install --dry-run mindsdb[handler] and check whether any new packages are required. Unfortunately --dry-run still downloads packages - perhaps we could monitor output and kill pip as soon as it starts downloading something? (this would indicate that the extra is not installed)

Note about running pip from code: https://stackoverflow.com/a/50255019

How to reproduce the error

No response

Anything else?

No response

hamishfagg avatar Nov 16 '23 22:11 hamishfagg

In this example if file handler can be imported then it can be used, so check is still reliable. Human readable error should be thrown if user try to add pdf and dont have langchain installed. But that is not related to file handler itself. Also we dont have any way in HTTP or mysql api to give user know that optional dependency is not installed. As a solution for future, may be good will be have a table in the gui for each handler with dependencies list and check marks aside of each dependency that show is it imported successfully or not.

StpMax avatar Nov 17 '23 14:11 StpMax

In this example if file handler can be imported then it can be used This isn't always true though

If you do this:

pip install .[langchain]
pip uninstall wikipedia
python -m mindsdb

MindsDB will tell you that the langchain handler is available - but if a user tries to use the wikipedia functionality it will crash.

I have run into this issue organically, I just can't remember which handlers were involved. There is unfortunately no easy solution for this though

hamishfagg avatar Nov 21 '23 00:11 hamishfagg

FWIW I've come up with a mostly-solution to this but just a proof of concept atm.

Basically you:

  • Get a list of the currently installed packages and their versions with pip freeze
  • Read the requirements file of a desired handler and parse the required versions with the packaging package
  • Compare the installed and required versions of each package with packaging

This only checks first-level dependencies, but I think it will be sufficient in 99% of cases. If the first-level depend is installed, then its depends should have been installed at the same time.

Its basically impossible to check the whole tree - johnnydep exists for this, but it is very very slow because it needs to download and in some cases build packages before it can work out dependencies.

Our only other option would be to compile the full tree of depends for each handler and bake it into our releases. But that seems overkill.

hamishfagg avatar Jan 31 '24 21:01 hamishfagg