ray icon indicating copy to clipboard operation
ray copied to clipboard

[data] fix np.array crash the allocate mem error when souce include short an…

Open Ox0400 opened this issue 1 year ago • 1 comments

Issue: https://github.com/ray-project/ray/issues/46293

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 414. GiB for an array with shape (4900,) and data type <U22697406

type(udf_return_col)=<class 'list'>  len(udf_return_col)=4900 
type(udf_return_col[0])=<class 'str'> len(udf_return_col[0])=2576

Why are these changes needed?

Related issue number

Checks

  • [ ] I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • [ ] I've run scripts/format.sh to lint the changes in this PR.
  • [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
    • [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in doc/source/tune/api/ under the corresponding .rst file.
  • [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • [ ] Unit tests
    • [ ] Release tests
    • [ ] This PR is not tested :(

Ox0400 avatar Jun 27 '24 12:06 Ox0400

Time consuming test

>>> import time
>>> st=time.time(); x=np.array(['Hello'] * 100 + ['\r\ns'*10000000] , dtype=np.dtype('str')); print('use:', time.time() - st)
use: 9.69443154335022
>>> st=time.time(); x=np.array(['Hello'] * 100 + ['\r\ns'*10000000] , dtype=np.dtype('O')); print('use:', time.time() - st)
use: 1.9776394367218018
>>> st=time.time(); x=np.array(['Hello'] * 100 + ['\r\ns'*10000000] , dtype=np.dtype('O')); print('use:', time.time() - st)
use: 0.029134511947631836
>>> st=time.time(); x=np.array(['Hello'] * 100 + ['\r\ns'*10000000] , dtype=np.dtype('O')); print('use:', time.time() - st)
use: 0.005353212356567383
>>> st=time.time(); x=np.array(['Hello'] * 100 + ['\r\ns'*10000000] , dtype=np.dtype('str')); print('use:', time.time() - st)
use: 9.803117036819458
>>> st=time.time(); x=np.array(['Hello'] * 100 + ['\r\ns'*10000000] ); print('use:', time.time() - st)
use: 11.640169858932495
>>> st=time.time(); x=np.array(['Hello'] * 100 + ['\r\ns'*10000000] ); print('use:', time.time() - st)
use: 11.6232750415802

Ox0400 avatar Jun 27 '24 12:06 Ox0400

Hi @Ox0400 - could you also provide a reproducible script we can test against?

richardliaw avatar Mar 18 '25 00:03 richardliaw

Hi, I'm going to close this PR since it's outdated and unfortunately it's not clear what the end-user issue is.

richardliaw avatar Apr 10 '25 17:04 richardliaw

@richardliaw https://github.com/ray-project/ray/issues/46293

Ox0400 avatar Apr 11 '25 04:04 Ox0400