LightGBM
LightGBM copied to clipboard
[ci] [python-package] Python tests leave files behind
Description
The Python unit tests in this project leave some files behind when they are done running.
They should be modified to use Python-managed temporary files that are automatically removed, so that:
- successive test runs don't accidentally rely on outputs from previous runs
- files aren't left behind on developers' local systems
Reproducible example
Build the Python package and run the Python tests.
cmake -B build -S .
cmake --build build --target _lightgbm
sh build-python.sh install --precompile
pytest tests/python_package_test
(for more details on this, see #6350).
Look at the files created.
git status --ignored
As of latest master
(https://github.com/microsoft/LightGBM/commit/b27d81ea411d04d8d071d4d4e75c19ffa15c5795), you'll see all of these created by tests:
categorical.model
lgb.model
lgb.pkl
lgb_train_data.bin
model.txt
Tree4.gv.pdf
Tree4.gv
Approach
Find the tests that created those files, and ensure that they stop creating them.
For example, it looks like lgb.model
probably comes from here:
https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L1373
And that that could be avoided using pytests
's tmp_path
fixture, like this:
https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L727
https://github.com/microsoft/LightGBM/blob/b27d81ea411d04d8d071d4d4e75c19ffa15c5795/tests/python_package_test/test_engine.py#L744
For more on how that works, see "How to use temporary directories and files in tests" (pytest docs).
Additional Comments
You do not need to put up a pull request fixing all of these! Contributions that fix any of these would be welcomed.
This list will be updated as these are fixed:
- [x] categorical.model (#6590)
- [ ] data_dask.csv
- [x] lgb.model (#6518)
- [x] lgb.pkl (#6518)
- [x] lgb_train_data.bin (#6606)
- [x] model.txt (#6590)
- [x] train.binary.bin (#6606)
- [ ] Tree4.gv.pdf
- [ ] Tree4.gv
If you are interested in working on this, comment here to indicate that and to ask for help if you need it.
Hi @jameslamb ,
I'm new to open source and would like to take up this issue.
Thanks!
Sure, thanks! @
me here if you have any questions.
Hey @jameslamb,
I have encountered a few issues while building the Python package. However, I have managed to build it successfully now. But, I am facing some errors while running the tests. I am not able to find the requirements.txt file. Can you suggest any way to install all the necessary modules?
Best, Shrikanth
Errors after running pytest tests/python_package_test
==================================================== test session starts =====================================================
platform darwin -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /Users/hitro/Desktop/Microsoft/LightGBM
collected 3 items / 9 errors
=========================================================== ERRORS ===========================================================
__________________________________ ERROR collecting tests/python_package_test/test_arrow.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_arrow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_arrow.py:6: in <module>
import pyarrow as pa
E ModuleNotFoundError: No module named 'pyarrow'
__________________________________ ERROR collecting tests/python_package_test/test_basic.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_basic.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_basic.py:12: in <module>
from sklearn.datasets import dump_svmlight_file, load_svmlight_file
E ModuleNotFoundError: No module named 'sklearn'
________________________________ ERROR collecting tests/python_package_test/test_callback.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_callback.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_callback.py:6: in <module>
from .utils import SERIALIZERS, pickle_and_unpickle_object
tests/python_package_test/utils.py:6: in <module>
import cloudpickle
E ModuleNotFoundError: No module named 'cloudpickle'
_______________________________ ERROR collecting tests/python_package_test/test_consistency.py _______________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_consistency.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_consistency.py:5: in <module>
from sklearn.datasets import load_svmlight_file
E ModuleNotFoundError: No module named 'sklearn'
__________________________________ ERROR collecting tests/python_package_test/test_dask.py ___________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_dask.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_dask.py:14: in <module>
from sklearn.metrics import accuracy_score, r2_score
E ModuleNotFoundError: No module named 'sklearn'
__________________________________ ERROR collecting tests/python_package_test/test_dual.py ___________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_dual.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_dual.py:8: in <module>
from sklearn.metrics import log_loss
E ModuleNotFoundError: No module named 'sklearn'
_________________________________ ERROR collecting tests/python_package_test/test_engine.py __________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_engine.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_engine.py:15: in <module>
import psutil
E ModuleNotFoundError: No module named 'psutil'
________________________________ ERROR collecting tests/python_package_test/test_plotting.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_plotting.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_plotting.py:3: in <module>
import pandas as pd
E ModuleNotFoundError: No module named 'pandas'
_________________________________ ERROR collecting tests/python_package_test/test_sklearn.py _________________________________
ImportError while importing test module '/Users/hitro/Desktop/Microsoft/LightGBM/tests/python_package_test/test_sklearn.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/opt/homebrew/Cellar/[email protected]/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/importlib/__init__.py:90: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/python_package_test/test_sklearn.py:9: in <module>
import joblib
E ModuleNotFoundError: No module named 'joblib'
================================================== short test summary info ===================================================
ERROR tests/python_package_test/test_arrow.py
ERROR tests/python_package_test/test_basic.py
ERROR tests/python_package_test/test_callback.py
ERROR tests/python_package_test/test_consistency.py
ERROR tests/python_package_test/test_dask.py
ERROR tests/python_package_test/test_dual.py
ERROR tests/python_package_test/test_engine.py
ERROR tests/python_package_test/test_plotting.py
ERROR tests/python_package_test/test_sklearn.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 9 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
===================================================== 9 errors in 3.31s ======================================================
Thanks for trying it out!
Please post error messages and logs as plaintext, not images, so they can be found from search engines. See these resources:
- https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors
- https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks
any way to install all the necessary modules
Follow these steps (but add pyarrow
): https://github.com/microsoft/LightGBM/pull/6310#issuecomment-1953487883
Sorry about that! I've updated my comment.
Thanks for the information. I'll start working on it 😄
I found another one generated by the Dask tests, added it above.
https://github.com/microsoft/LightGBM/blob/631e0a2a7bdd694a91f30378fb271d05ce438122/tests/python_package_test/test_dask.py#L1534
@Hitro147 Are you still interested in pursuing this?
Hello @jameslamb,
I'm facing some issues with my current environment, but I'll need time to resolve them. However, I need to put it on hold for a while, if it's open after a while I'd like to return to it when I have more time. Feel free to assign this to someone if they are interested in this.
Thanks for giving me this opportunity! 😄
Ok sure, no problem. Comment here or on #6350 any time if you need help.
Anyone else reading this... you are welcome to contribute! A PR even just eliminating one of these left-behind files would be greatly appreciated 😊
@jameslamb, I would like to contribute to this issue, or any related good first issue (as there are multiple mentioned), here in the repository
Sure! This is a great issue to start with @Arup-Chauhan .
I recommend focusing on a single file like categorical.model
in your first contribution, to get used to the process. You can find where it's used like this:
git grep -E 'categorical\.model'
Thanks for spending some time on LightGBM, we really appreciate it!
Hi @jameslamb , thanks for this, I will get started, will reach out to you if I need assistance