ray icon indicating copy to clipboard operation
ray copied to clipboard

[Feature] large data storage in Ray object store

Open chuckhope opened this issue 3 years ago • 3 comments

Search before asking

  • [X] I had searched in the issues and found no similar feature requirement.

Description

Hi, I have utilized ray tune to do the knowledge distillation work with huggingface transformers. Before training ,I have to use the SquadV1Prcessor of huggingface to extract features. I have a dataset of about 400MB, and it turns to 80GB (the virtural memory in the 'htop' UI).

The solutions I have tried:

  1. I have tried wrap the data in the trainable function >>> ValueError: The actor ImplicitFunc is too large > FUNCTION_SIZE_ERROR_THRESHOLD=95 MiB
  2. put my processed features into tune.with_parameters instead >>>. However, the program gets stuck without info. I can see from 'htop' UI that the program is still running but with a 80GB VIRT.
  3. put my processed features using ray.put and ray.get. >>> basically, I guess "ray.put" do the same thing with "tune.with_parameters"

Do you have any ideas my help, thank you!

Use case

ray tune to store a data ref larger than 80GB

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

chuckhope avatar Dec 23 '21 09:12 chuckhope

That seems a lot of data. Can you try run ray memory --stats-only to see the object store memory usage?

scv119 avatar Dec 30 '21 18:12 scv119

@scv119 Thanks for your response. I got the message"Plasma memory usage 20305MiB, 10 objects, 55.53% full, 55.53% needed Objects consumed by Ray tasks: 20305 MiB" and the program just got stuck without info.

chuckhope avatar Jan 06 '22 03:01 chuckhope

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] avatar Aug 10 '22 03:08 stale[bot]

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

stale[bot] avatar Sep 20 '22 18:09 stale[bot]