cubed icon indicating copy to clipboard operation
cubed copied to clipboard

Ray Executor

Open alxmrs opened this issue 1 year ago • 6 comments

In addition to accelerator support (e.g. via #304), Cubed could benefit ML users by providing ray executor: https://docs.ray.io/en/latest/ray-core/walkthrough.html

Since Cubed is a serverless model, I bet it could get away with only using Tasks/remote functions.

From talking with @cromwellian a bit, my hope is that Cubed could provide memory bounds when trying to saturate GPUs during model training. I'm not totally sure exactly what a training loop with Cubed would look like. Here's how ray integrates with PyTorch, for example: https://docs.ray.io/en/latest/train/api/doc/ray.train.torch.TorchTrainer.html#ray.train.torch.TorchTrainer

@shoyer pointed out to me once the idea that GPU OOM errors occur while taking the gradient of a function graph, not necessarily on the forward pass. I'm not totally sure right now if Cubed is in fact a good fit for tackling this problem, only that the potential is exciting.

alxmrs avatar Jun 24 '24 21:06 alxmrs