scikit-learn-intelex
scikit-learn-intelex copied to clipboard
[testing] refactor test_memory_usage.py
Description
test_memory_usage is another test suite which evaluates all scikit-learn-intelex algorithms. This one tests for memory leaks and can be useful for general memory management understanding. This has unnecessary duplication of code, and should follow other standards to minimize work. Secondarily, with more and more GPU algorithm support, this test suite provides no useful information. A onedal function is created to query the queue's sycl device for free global memory, and works in place of tracemalloc's get_traced_memory (albeit in a much more crude way).
Changes proposed in this pull request:
- rewrite test_memory_usage to be useful for gpu-only (zero copy) algorithms
- generalize and centralize to minimize maintenance
Tasks
- [x] Implement Intel GPU memory monitor
- [x] rewrite functions tests using partial
- [x] take estimators from sklearnex/tests/_utils.py
- [x] evaluate special estimators
- [x] rewrite memory evaluation to use get_dataframes_and_queues (re-parameterize fortran and c contiguous separately)
- [x] pass public CI
- [ ] pass private CI
NOTE: to self, will need to manipulate ZES_ENABLE_SYSMAN with a level zero driver to make this work, otherwise will raise a RuntimeError that will need to be caught (and probably require a pytest.skip)
This PR mostly looks like enhancement rather than just refactoring :) I again voting for splitting out refactoring and enhancements from one PR
Please review #1763, this will greatly simplify this PR.
/intelci: run
kneighbors calls via sycl are causing CI timeouts. Also, it looks like the private CI drivers aren't level-zero, both of these need to be investigated.
Easy test would be to see what the time comparison of using a dpt.tensor vs config_context to cpu for the same pytests on a local machine. WIll probably do it on an ice lake cpu... This will point if it is the onedal backend or something with converting the tensors (which I suspect).
/intelci: run
This PR has uncovered some rather unsavory stuff with GPU memory. As of now, NO gpu algos pass consistently. Either that or the strategy is flawed.
/intelci: run
While CI is green, changes in private CI will need to occur before pulling in this PR. There will likely be a lot of testing needed.
Private CI has been updated to support ZES_ENABLE_SYSMAN, will see if it runs the new tests.
/intelci: run
/intelci: run
Seg fault for ExtraTrees algorithms on GPU, will need to create a reproducer script for it, and disable it from this testing.
/azp run CI
Azure Pipelines successfully started running 1 pipeline(s).
/intelci: run
#1784 will need to be integrated to solve issues with LocalOutlierFactor(novelty=True) on GPU using dpctl
The initial fit of the initial estimator in the for loop causes a memory leak in most of the gpu codes. I will need to ask someone with more GPU oneDAL knowledge to ask how things are stored on GPU after something is run. Something is also weird with dpctl and queues, so I did it via a gpu offloading to get things a bit more stable. it could be that the test is run in a config_context to simplify/ focus the results to oneDAL and sklearnex specifically.
/intelci: run
/intelci: run
/intelci: run
I am going to swap orders of things to see if these failures are stable. There is a one test case failure difference between 3.10 and 3.9, where 3.10 also fails tests/test_memory_usage.py::test_gpu_memory_leaks[(1000, 100)-C-LocalOutlierFactor-SyclQueue_GPU]. A python version dependency is not a good sign.
/intelci: run
Will swap the special instances and standards and rerun with fortran first. This case brought fewer failures. This looks problematic.
/intelci: run
There are some algorithms which consistently pass. I want to replicate these on a local machine to see how deterministic the results are.
Rerun for determinacy: http://intel-ci.intel.com/eef59a23-df27-f1f4-8546-a4bf010d0e2e
/intelci: run
/intelci: run
/intelci: run