Description

test_memory_usage is another test suite which evaluates all scikit-learn-intelex algorithms. This one tests for memory leaks and can be useful for general memory management understanding. This has unnecessary duplication of code, and should follow other standards to minimize work. Secondarily, with more and more GPU algorithm support, this test suite provides no useful information. A onedal function is created to query the queue's sycl device for free global memory, and works in place of tracemalloc's get_traced_memory (albeit in a much more crude way).

Changes proposed in this pull request:

rewrite test_memory_usage to be useful for gpu-only (zero copy) algorithms
generalize and centralize to minimize maintenance

Tasks

[x] Implement Intel GPU memory monitor
[x] rewrite functions tests using partial
[x] take estimators from sklearnex/tests/_utils.py
[x] evaluate special estimators
[x] rewrite memory evaluation to use get_dataframes_and_queues (re-parameterize fortran and c contiguous separately)
[x] pass public CI
[ ] pass private CI

Mar 26 '24 13:03 icfaust

NOTE: to self, will need to manipulate ZES_ENABLE_SYSMAN with a level zero driver to make this work, otherwise will raise a RuntimeError that will need to be caught (and probably require a pytest.skip)

Mar 26 '24 13:03 icfaust

This PR mostly looks like enhancement rather than just refactoring :) I again voting for splitting out refactoring and enhancements from one PR

Please review #1763, this will greatly simplify this PR.

Mar 26 '24 19:03 icfaust

/intelci: run

Mar 27 '24 17:03 icfaust

kneighbors calls via sycl are causing CI timeouts. Also, it looks like the private CI drivers aren't level-zero, both of these need to be investigated.

Easy test would be to see what the time comparison of using a dpt.tensor vs config_context to cpu for the same pytests on a local machine. WIll probably do it on an ice lake cpu... This will point if it is the onedal backend or something with converting the tensors (which I suspect).

Mar 27 '24 19:03 icfaust

/intelci: run

Mar 27 '24 20:03 icfaust

This PR has uncovered some rather unsavory stuff with GPU memory. As of now, NO gpu algos pass consistently. Either that or the strategy is flawed.

Apr 02 '24 12:04 icfaust

/intelci: run

Apr 02 '24 20:04 icfaust

While CI is green, changes in private CI will need to occur before pulling in this PR. There will likely be a lot of testing needed.

Apr 03 '24 20:04 icfaust

Private CI has been updated to support ZES_ENABLE_SYSMAN, will see if it runs the new tests.

Apr 04 '24 08:04 icfaust

/intelci: run

Apr 04 '24 08:04 icfaust

/intelci: run

Apr 04 '24 12:04 icfaust

Seg fault for ExtraTrees algorithms on GPU, will need to create a reproducer script for it, and disable it from this testing.

Apr 04 '24 21:04 icfaust

/azp run CI

Apr 05 '24 09:04 icfaust

Azure Pipelines successfully started running 1 pipeline(s).

Apr 05 '24 09:04 azure-pipelines[bot]

/intelci: run

Apr 05 '24 10:04 icfaust

#1784 will need to be integrated to solve issues with LocalOutlierFactor(novelty=True) on GPU using dpctl

Apr 05 '24 13:04 icfaust

The initial fit of the initial estimator in the for loop causes a memory leak in most of the gpu codes. I will need to ask someone with more GPU oneDAL knowledge to ask how things are stored on GPU after something is run. Something is also weird with dpctl and queues, so I did it via a gpu offloading to get things a bit more stable. it could be that the test is run in a config_context to simplify/ focus the results to oneDAL and sklearnex specifically.

Apr 05 '24 16:04 icfaust

/intelci: run

Apr 06 '24 20:04 icfaust

/intelci: run

Apr 08 '24 04:04 icfaust

/intelci: run

Apr 08 '24 06:04 icfaust

I am going to swap orders of things to see if these failures are stable. There is a one test case failure difference between 3.10 and 3.9, where 3.10 also fails tests/test_memory_usage.py::test_gpu_memory_leaks[(1000, 100)-C-LocalOutlierFactor-SyclQueue_GPU]. A python version dependency is not a good sign.

Apr 08 '24 08:04 icfaust

/intelci: run

Apr 08 '24 08:04 icfaust

Will swap the special instances and standards and rerun with fortran first. This case brought fewer failures. This looks problematic.

Apr 08 '24 09:04 icfaust

/intelci: run

Apr 08 '24 09:04 icfaust

Apr 08 '24 10:04 icfaust

There are some algorithms which consistently pass. I want to replicate these on a local machine to see how deterministic the results are.

Apr 08 '24 10:04 icfaust

Rerun for determinacy: http://intel-ci.intel.com/eef59a23-df27-f1f4-8546-a4bf010d0e2e

Apr 08 '24 11:04 icfaust

/intelci: run

Apr 11 '24 10:04 icfaust

/intelci: run

Apr 12 '24 19:04 icfaust

/intelci: run

Apr 16 '24 09:04 icfaust

scikit-learn-intelex
scikit-learn-intelex copied to clipboard

[testing] refactor test_memory_usage.py

Description

scikit-learn-intelex scikit-learn-intelex copied to clipboard

[testing] refactor test_memory_usage.py

Description

scikit-learn-intelex
scikit-learn-intelex copied to clipboard