feature_engine
feature_engine copied to clipboard
reduce running time of tests for feature selection module
@solegalli I guess this relates to #592
If we would be able to profile any class or class method then I think it's trivial to profile tests as well. After having tests profile then we can go and optimize slow tests.
What do you think? Should I pick this one as well?
Hey @Okroshiashvili
It is somewhat related. If we make classes more efficient, then the tests will run faster. But I think here we could already gain a lot by decreasing the size of the datasets that we use in the tests.
Recursive feature elimination / addition are per se quite time consuming. Increasing the speed of the tests would help us make our dev work more efficient.
Having said this, you are more than welcome to take the 2 issues together!
Okay, sounds good for me. I will handle this issue alongside the another ☺️
Hi @solegalli
So, as I mentioned in #592 we can use Pyinstrument to profile tests as well.
I've created small function to profile tests. Here it is:
from pathlib import Path
import pytest
from pyinstrument.profiler import Profiler
TESTS_ROOT = Path.cwd()
@pytest.fixture(autouse=True)
def auto_profile(request):
PROFILE_ROOT = TESTS_ROOT / "profiles/test_profiles"
profiler = Profiler()
profiler.start()
# Run the test
yield
profiler.stop()
PROFILE_ROOT.mkdir(exist_ok=True)
node_id = request.node.nodeid.replace("tests/", "").strip().split("/")
if len(node_id) == 1:
results_file = PROFILE_ROOT / f"{node_id[0].split('::')[-1]}.html"
else:
tp = "/".join(node_id[:-1])
(PROFILE_ROOT / tp).mkdir(parents=True, exist_ok=True)
results_file = PROFILE_ROOT / tp / f"{node_id[-1].split('::')[-1]}.html"
with open(results_file, "w", encoding="utf-8") as f_html:
f_html.write(profiler.output_html())
Put this function inside conftest.py
file in tests directory and run tests as you used to run them. It will profile all the tests and will create the directory containing profile (HTML) files for each test_****.py
file in the same hierarchy as the tests are.
I don't recommend at all to use this in any CI/CD flow. It produces lots of HTML files. Only internal usage
After having this, I will go and identify slow tests and will act accordingly. Either will reduce mock data size or optimize class behind that particular slow test.
Sounds good! Thank you so much @Okroshiashvili
You're welcome @solegalli
So, I'll push this issue forward and will investigate slow tests and will update you asap