DataProfiler tests failures because of missing_module

General Information:

OS: Ubuntu 20.04
Python version: 3.10.4
Library version: 0.8.4

Describe the bug: When I execute python3 -m unittest dataprofiler/tests/reports/test_graphs.py

test_no_matplotlib seems to break something internal within matplotlib.

After it runs, multiple other tests in test_graphs.py fail with the error:

======================================================================
ERROR: test_null_list (dataprofiler.tests.reports.test_graphs.TestPlotMissingValuesMatrix)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/codespace/.python/current/lib/python3.10/unittest/mock.py", line 1369, in patched
    return func(*newargs, **newkeywargs)
  File "/workspaces/DataProfiler/dataprofiler/tests/reports/test_graphs.py", line 222, in test_null_list
    fig = graphs.plot_missing_values_matrix(profiler)
  File "/workspaces/DataProfiler/dataprofiler/reports/utils.py", line 50, in new_f
    return f(*args, **kwds)
  File "/workspaces/DataProfiler/dataprofiler/reports/graphs.py", line 208, in plot_missing_values_matrix
    return plot_col_missing_values(profiler.profile, ax=ax, title=title)
  File "/workspaces/DataProfiler/dataprofiler/reports/utils.py", line 50, in new_f
    return f(*args, **kwds)
  File "/workspaces/DataProfiler/dataprofiler/reports/graphs.py", line 259, in plot_col_missing_values
    fig = plt.figure()
  File "/workspaces/DataProfiler/venv/lib/python3.10/site-packages/matplotlib/_api/deprecation.py", line 454, in wrapper
    return func(*args, **kwargs)
  File "/workspaces/DataProfiler/venv/lib/python3.10/site-packages/matplotlib/pyplot.py", line 771, in figure
    manager = new_figure_manager(
  File "/workspaces/DataProfiler/venv/lib/python3.10/site-packages/matplotlib/pyplot.py", line 346, in new_figure_manager
    _warn_if_gui_out_of_main_thread()
  File "/workspaces/DataProfiler/venv/lib/python3.10/site-packages/matplotlib/pyplot.py", line 336, in _warn_if_gui_out_of_main_thread
    if (_get_required_interactive_framework(_get_backend_mod()) and
  File "/workspaces/DataProfiler/venv/lib/python3.10/site-packages/matplotlib/pyplot.py", line 206, in _get_backend_mod
    switch_backend(dict.__getitem__(rcParams, "backend"))
  File "/workspaces/DataProfiler/venv/lib/python3.10/site-packages/matplotlib/pyplot.py", line 251, in switch_backend
    switch_backend(candidate)
  File "/workspaces/DataProfiler/venv/lib/python3.10/site-packages/matplotlib/pyplot.py", line 288, in switch_backend
    class backend_mod(matplotlib.backend_bases._Backend):
  File "/workspaces/DataProfiler/venv/lib/python3.10/site-packages/matplotlib/_api/__init__.py", line 224, in __getattr__
    raise AttributeError(
AttributeError: module 'matplotlib' has no attribute 'backend_bases'

When the tests are run individually they work fine:

/workspaces/DataProfiler ❯❯❯ python3 -m unittest dataprofiler.tests.reports.test_graphs.TestPlotMissingValuesMatrix.test_null_list

.
----------------------------------------------------------------------
Ran 1 test in 0.069s

OK

If I comment out test_no_matplotlib then all of the tests pass when running python3 -m unittest dataprofiler/tests/reports/test_graphs.py

Dec 03 '22 05:12 leos

@leos Interesting. I was not able to incur a failure in py3.9 so it might be alteration in py3.10+

However, if you want to try altering the missing_module_function inside that test class by adding:

        import importlib  # can be moved to top of function w/ import within the mock above
        for module in modules_to_remove:
            if module in sys.modules:
                del sys.modules[module]
                importlib.import_module(module)

to the end

this might fix the issue as it ensures the cache of the module gets reset after the tests too.

Dec 03 '22 06:12 JGSweets

I can try that, but it's a bit concerning that CI doesn't test this - it's running under 3.10 for at least one of the runs. Is it possible that it runs the test in an order that doesn't trigger this issue?

Dec 03 '22 06:12 leos

The CI is utilizing the parallelization flag for pytest. We could update the CI to not use the flag (investigate speed changes as a result).

The failure is more of a function of the test creation and not reseting the module prior to executing the subsequent tests rather than the code itself.

However, your point is valid that we should ensure the tests execute successfully in either execution method / environment.

Dec 03 '22 06:12 JGSweets

@leos I tested locally on py3.10 as well and I was able to incur the error despite the fix I suggested. I'll have to investigate further to get to the root of the issue.

Dec 03 '22 07:12 JGSweets

#736 is a rework that should prevent this error in the future as now we don't modify the modules themselves.

Dec 03 '22 07:12 JGSweets

Thanks for the fix! Also, should we be using unittest or pytest to run the test suite? pytest is clearly being installed in requirements-test.txt and being used by test-python-package.yml but make test is using unittest?

Would it make sense to have the CI workflows trigger make lint and make test instead of replicating the actual commands (which could go out of sync as they have here?).

Dec 03 '22 14:12 leos

Agreed, we should be consistent through and utilize the Makefile

Dec 03 '22 16:12 JGSweets

In regards to testing, I think we should mimic the CI as well. We should also update the ghpages to show the same where different. However, we shouldn't restrict as both should function in the current design.

Dec 03 '22 16:12 JGSweets

tests failures because of missing_module_test