scikit-learn-intelex icon indicating copy to clipboard operation
scikit-learn-intelex copied to clipboard

[testing] refactor test_memory_usage.py

Open icfaust opened this issue 11 months ago • 44 comments

Description

test_memory_usage is another test suite which evaluates all scikit-learn-intelex algorithms. This one tests for memory leaks and can be useful for general memory management understanding. This has unnecessary duplication of code, and should follow other standards to minimize work. Secondarily, with more and more GPU algorithm support, this test suite provides no useful information. A onedal function is created to query the queue's sycl device for free global memory, and works in place of tracemalloc's get_traced_memory (albeit in a much more crude way).

Changes proposed in this pull request:

  • rewrite test_memory_usage to be useful for gpu-only (zero copy) algorithms
  • generalize and centralize to minimize maintenance

Tasks

  • [x] Implement Intel GPU memory monitor
  • [x] rewrite functions tests using partial
  • [x] take estimators from sklearnex/tests/_utils.py
  • [x] evaluate special estimators
  • [x] rewrite memory evaluation to use get_dataframes_and_queues (re-parameterize fortran and c contiguous separately)
  • [x] pass public CI
  • [ ] pass private CI

icfaust avatar Mar 26 '24 13:03 icfaust

NOTE: to self, will need to manipulate ZES_ENABLE_SYSMAN with a level zero driver to make this work, otherwise will raise a RuntimeError that will need to be caught (and probably require a pytest.skip)

icfaust avatar Mar 26 '24 13:03 icfaust

This PR mostly looks like enhancement rather than just refactoring :) I again voting for splitting out refactoring and enhancements from one PR

Please review #1763, this will greatly simplify this PR.

icfaust avatar Mar 26 '24 19:03 icfaust

/intelci: run

icfaust avatar Mar 27 '24 17:03 icfaust

kneighbors calls via sycl are causing CI timeouts. Also, it looks like the private CI drivers aren't level-zero, both of these need to be investigated.

Easy test would be to see what the time comparison of using a dpt.tensor vs config_context to cpu for the same pytests on a local machine. WIll probably do it on an ice lake cpu... This will point if it is the onedal backend or something with converting the tensors (which I suspect).

icfaust avatar Mar 27 '24 19:03 icfaust

/intelci: run

icfaust avatar Mar 27 '24 20:03 icfaust

This PR has uncovered some rather unsavory stuff with GPU memory. As of now, NO gpu algos pass consistently. Either that or the strategy is flawed.

icfaust avatar Apr 02 '24 12:04 icfaust

/intelci: run

icfaust avatar Apr 02 '24 20:04 icfaust

While CI is green, changes in private CI will need to occur before pulling in this PR. There will likely be a lot of testing needed.

icfaust avatar Apr 03 '24 20:04 icfaust

Private CI has been updated to support ZES_ENABLE_SYSMAN, will see if it runs the new tests.

icfaust avatar Apr 04 '24 08:04 icfaust

/intelci: run

icfaust avatar Apr 04 '24 08:04 icfaust

/intelci: run

icfaust avatar Apr 04 '24 12:04 icfaust

Seg fault for ExtraTrees algorithms on GPU, will need to create a reproducer script for it, and disable it from this testing.

icfaust avatar Apr 04 '24 21:04 icfaust

/azp run CI

icfaust avatar Apr 05 '24 09:04 icfaust

Azure Pipelines successfully started running 1 pipeline(s).

azure-pipelines[bot] avatar Apr 05 '24 09:04 azure-pipelines[bot]

/intelci: run

icfaust avatar Apr 05 '24 10:04 icfaust

#1784 will need to be integrated to solve issues with LocalOutlierFactor(novelty=True) on GPU using dpctl

icfaust avatar Apr 05 '24 13:04 icfaust

The initial fit of the initial estimator in the for loop causes a memory leak in most of the gpu codes. I will need to ask someone with more GPU oneDAL knowledge to ask how things are stored on GPU after something is run. Something is also weird with dpctl and queues, so I did it via a gpu offloading to get things a bit more stable. it could be that the test is run in a config_context to simplify/ focus the results to oneDAL and sklearnex specifically.

icfaust avatar Apr 05 '24 16:04 icfaust

/intelci: run

icfaust avatar Apr 06 '24 20:04 icfaust

/intelci: run

icfaust avatar Apr 08 '24 04:04 icfaust

/intelci: run

icfaust avatar Apr 08 '24 06:04 icfaust

image

I am going to swap orders of things to see if these failures are stable. There is a one test case failure difference between 3.10 and 3.9, where 3.10 also fails tests/test_memory_usage.py::test_gpu_memory_leaks[(1000, 100)-C-LocalOutlierFactor-SyclQueue_GPU]. A python version dependency is not a good sign.

icfaust avatar Apr 08 '24 08:04 icfaust

/intelci: run

icfaust avatar Apr 08 '24 08:04 icfaust

image Will swap the special instances and standards and rerun with fortran first. This case brought fewer failures. This looks problematic.

icfaust avatar Apr 08 '24 09:04 icfaust

/intelci: run

icfaust avatar Apr 08 '24 09:04 icfaust

image

icfaust avatar Apr 08 '24 10:04 icfaust

There are some algorithms which consistently pass. I want to replicate these on a local machine to see how deterministic the results are.

icfaust avatar Apr 08 '24 10:04 icfaust

Rerun for determinacy: http://intel-ci.intel.com/eef59a23-df27-f1f4-8546-a4bf010d0e2e

icfaust avatar Apr 08 '24 11:04 icfaust

/intelci: run

icfaust avatar Apr 11 '24 10:04 icfaust

/intelci: run

icfaust avatar Apr 12 '24 19:04 icfaust

/intelci: run

icfaust avatar Apr 16 '24 09:04 icfaust