mars icon indicating copy to clipboard operation
mars copied to clipboard

[storage] Add batch.put API for storage

Open chaokunyang opened this issue 3 years ago • 4 comments

Is your feature request related to a problem? Please describe. Ray.put with owner will issue an sync rpc to owner, when there are many obejcts to put, Ray.put will become the bottleneck for operand executing especially for shuffle mapper stage. image

Describe the solution you'd like A batch.put API will eliminate the issue.

With batch put: image without batch put image

The api can be added to mars.storage.base.StorageBackend:

@abstractmethod
async def batch_put(self, objects, importance: int = 0) -> List[ObjectInfo]:
    return [self.put(obj, importance) for obj in objects]

If the storage client supports batch.put, it can override this method to get better performance.

chaokunyang avatar Apr 18 '22 06:04 chaokunyang

Looks reasonable, batch_get may also be useful for other storage backends.

hekaisheng avatar Apr 18 '22 06:04 hekaisheng

Before this feature is implemented in Mars, Ray should first implement this API in its public releases.

wjsi avatar Apr 18 '22 09:04 wjsi

@chaokunyang better to trace related issues or prs in Ray here.

wjsi avatar May 17 '22 03:05 wjsi

@Catch-Bull Will ray support batch.put? If not, I'll close this issue

chaokunyang avatar May 17 '22 07:05 chaokunyang