dpctl icon indicating copy to clipboard operation
dpctl copied to clipboard

Add implicit NumPy conversion for dpctl.tensor.usm_ndarray types

Open icfaust opened this issue 5 months ago • 4 comments

The functionality introduced in #1964 can be better optimized by moving code from _copy_utils to the usm_ndarray itself. This will make seamless integration into other larger codebases like scikit-learn, where use of asarray is common. It attempts to solve #2129

  • [x] Have you provided a meaningful PR description?
  • [ ] Have you added a test, reproducer or referred to an issue with a reproducer?
  • [ ] Have you tested your changes locally for CPU and GPU devices?
  • [ ] Have you made sure that new changes do not introduce compiler warnings?
  • [ ] Have you checked performance impact of proposed changes?
  • [x] Have you added documentation for your changes, if necessary?
  • [x] Have you added your changes to the changelog?
  • [x] If this PR is a work in progress, are you opening the PR as a draft?

icfaust avatar Aug 05 '25 08:08 icfaust

Coverage Status

coverage: 85.903% (+0.03%) from 85.878% when pulling 1b8b30b3cc443fd0ab6b2ba76bc0b70b6a74e807 on icfaust:dev/array_fix into 3381d14a62e5fd6124ed4c94ff8c6f6aa307e27a on IntelPython:master.

coveralls avatar Aug 05 '25 10:08 coveralls

@ndgrigorian I am unable to check the jenkins CI checks. Is there any way someone from your team can help me?

icfaust avatar Aug 05 '25 23:08 icfaust

@ndgrigorian I am unable to check the jenkins CI checks. Is there any way someone from your team can help me?

I can help with that, but there are still concerns with this PR that need to be addressed, some of which I mentioned in the issue

If this PR were modified to accept an environment variable to permit implicit conversion, that would be acceptable, but implicit conversion under all circumstances should not be allowed

ndgrigorian avatar Aug 06 '25 00:08 ndgrigorian

Implicit conversion means GPU resident array data is going to be copied to host without explicit user control. This introduces source of hard to pin-point performance bottlenecks.

CuPy does not do it for a good reason. Computation on GPU using data-parallel algorithms is generally quite a lot faster. So CuPy promotes coercion of np.ndarray to cp.ndarray by implementing __array_function__ hook in cp.ndarray, so that cp.sin( inp: np.ndarrray ) actually returns cp.ndarray on default device.

I hope this change does not get merged, it is a regression

sycloid avatar Sep 17 '25 13:09 sycloid