LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

[ci] [python-package] Python tests leave files behind

Open jameslamb opened this issue 11 months ago • 12 comments

Description

The Python unit tests in this project leave some files behind when they are done running.

They should be modified to use Python-managed temporary files that are automatically removed, so that:

  • successive test runs don't accidentally rely on outputs from previous runs
  • files aren't left behind on developers' local systems

Reproducible example

Build the Python package and run the Python tests.

cmake -B build -S .
cmake --build build --target _lightgbm
sh build-python.sh install --precompile
pytest tests/python_package_test

(for more details on this, see #6350).

Look at the files created.

git status --ignored

As of latest master (https://github.com/microsoft/LightGBM/commit/b27d81ea411d04d8d071d4d4e75c19ffa15c5795), you'll see all of these created by tests:

categorical.model
lgb.model
lgb.pkl
lgb_train_data.bin
model.txt
Tree4.gv.pdf
Tree4.gv

Approach

Find the tests that created those files, and ensure that they stop creating them.

For example, it looks like lgb.model probably comes from here:

https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L1373

And that that could be avoided using pytests's tmp_path fixture, like this:

https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L727

https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L744

For more on how that works, see "How to use temporary directories and files in tests" (pytest docs).

Additional Comments

You do not need to put up a pull request fixing all of these! Contributions that fix any of these would be welcomed.

This list will be updated as these are fixed:

  • [x] categorical.model (#6590)
  • [ ] data_dask.csv
  • [x] lgb.model (#6518)
  • [x] lgb.pkl (#6518)
  • [x] lgb_train_data.bin (#6606)
  • [x] model.txt (#6590)
  • [x] train.binary.bin (#6606)
  • [ ] Tree4.gv.pdf
  • [ ] Tree4.gv

If you are interested in working on this, comment here to indicate that and to ask for help if you need it.

jameslamb avatar Mar 15 '24 01:03 jameslamb

Hi @jameslamb ,

I'm new to open source and would like to take up this issue.

Thanks!

Hitro147 avatar Mar 15 '24 12:03 Hitro147

Sure, thanks! @ me here if you have any questions.

jameslamb avatar Mar 15 '24 12:03 jameslamb

Hey @jameslamb,

I have encountered a few issues while building the Python package. However, I have managed to build it successfully now. But, I am facing some errors while running the tests. I am not able to find the requirements.txt file. Can you suggest any way to install all the necessary modules?

Best, Shrikanth

Errors after running pytest tests/python_package_test

==================================================== test session starts =====================================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/hitro/Desktop/Microsoft/LightGBM
collected 3 items / 9 errors                                                                                                 

=========================================================== ERRORS ===========================================================
__________________________________ ERROR collecting tests/python_package_test/test_arrow.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_arrow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_arrow.py:6: in <module>
    import pyarrow as pa
E   ModuleNotFoundError: No module named 'pyarrow'
__________________________________ ERROR collecting tests/python_package_test/test_basic.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_basic.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_basic.py:12: in <module>
    from sklearn.datasets import dump_svmlight_file, load_svmlight_file
E   ModuleNotFoundError: No module named 'sklearn'
________________________________ ERROR collecting tests/python_package_test/test_callback.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_callback.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_callback.py:6: in <module>
    from .utils import SERIALIZERS, pickle_and_unpickle_object
tests/python_package_test/utils.py:6: in <module>
    import cloudpickle
E   ModuleNotFoundError: No module named 'cloudpickle'
_______________________________ ERROR collecting tests/python_package_test/test_consistency.py _______________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_consistency.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_consistency.py:5: in <module>
    from sklearn.datasets import load_svmlight_file
E   ModuleNotFoundError: No module named 'sklearn'
__________________________________ ERROR collecting tests/python_package_test/test_dask.py ___________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_dask.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_dask.py:14: in <module>
    from sklearn.metrics import accuracy_score, r2_score
E   ModuleNotFoundError: No module named 'sklearn'
__________________________________ ERROR collecting tests/python_package_test/test_dual.py ___________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_dual.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_dual.py:8: in <module>
    from sklearn.metrics import log_loss
E   ModuleNotFoundError: No module named 'sklearn'
_________________________________ ERROR collecting tests/python_package_test/test_engine.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_engine.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_engine.py:15: in <module>
    import psutil
E   ModuleNotFoundError: No module named 'psutil'
________________________________ ERROR collecting tests/python_package_test/test_plotting.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_plotting.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_plotting.py:3: in <module>
    import pandas as pd
E   ModuleNotFoundError: No module named 'pandas'
_________________________________ ERROR collecting tests/python_package_test/test_sklearn.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_sklearn.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_sklearn.py:9: in <module>
    import joblib
E   ModuleNotFoundError: No module named 'joblib'
================================================== short test summary info ===================================================
ERROR tests/python_package_test/test_arrow.py
ERROR tests/python_package_test/test_basic.py
ERROR tests/python_package_test/test_callback.py
ERROR tests/python_package_test/test_consistency.py
ERROR tests/python_package_test/test_dask.py
ERROR tests/python_package_test/test_dual.py
ERROR tests/python_package_test/test_engine.py
ERROR tests/python_package_test/test_plotting.py
ERROR tests/python_package_test/test_sklearn.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 9 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================== 9 errors in 3.31s ======================================================

Hitro147 avatar Mar 19 '24 14:03 Hitro147

Thanks for trying it out!

Please post error messages and logs as plaintext, not images, so they can be found from search engines. See these resources:

  • https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors
  • https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks

any way to install all the necessary modules

Follow these steps (but add pyarrow): https://github.com/microsoft/LightGBM/pull/6310#issuecomment-1953487883

jameslamb avatar Mar 19 '24 14:03 jameslamb

Sorry about that! I've updated my comment.

Thanks for the information. I'll start working on it 😄

Hitro147 avatar Mar 19 '24 14:03 Hitro147

I found another one generated by the Dask tests, added it above.

https://github.com/microsoft/LightGBM/blob/631e0a2a7bdd694a91f30378fb271d05ce438122/tests/python_package_test/test_dask.py#L1534

jameslamb avatar Mar 20 '24 03:03 jameslamb

@Hitro147 Are you still interested in pursuing this?

jameslamb avatar Apr 24 '24 03:04 jameslamb

Hello @jameslamb,

I'm facing some issues with my current environment, but I'll need time to resolve them. However, I need to put it on hold for a while, if it's open after a while I'd like to return to it when I have more time. Feel free to assign this to someone if they are interested in this.

Thanks for giving me this opportunity! 😄

Hitro147 avatar Apr 24 '24 05:04 Hitro147

Ok sure, no problem. Comment here or on #6350 any time if you need help.

Anyone else reading this... you are welcome to contribute! A PR even just eliminating one of these left-behind files would be greatly appreciated 😊

jameslamb avatar Apr 26 '24 22:04 jameslamb

@jameslamb, I would like to contribute to this issue, or any related good first issue (as there are multiple mentioned), here in the repository

Arup-Chauhan avatar Jun 13 '24 20:06 Arup-Chauhan

Sure! This is a great issue to start with @Arup-Chauhan .

I recommend focusing on a single file like categorical.model in your first contribution, to get used to the process. You can find where it's used like this:

git grep -E 'categorical\.model'

Thanks for spending some time on LightGBM, we really appreciate it!

jameslamb avatar Jun 13 '24 21:06 jameslamb

Hi @jameslamb , thanks for this, I will get started, will reach out to you if I need assistance

Arup-Chauhan avatar Jun 13 '24 22:06 Arup-Chauhan