mars
mars copied to clipboard
[storage] Add batch.put API for storage
Is your feature request related to a problem? Please describe.
Ray.put with owner will issue an sync rpc to owner, when there are many obejcts to put, Ray.put will become the bottleneck for operand executing especially for shuffle mapper stage.

Describe the solution you'd like
A batch.put API will eliminate the issue.
With batch put:
without batch put

The api can be added to mars.storage.base.StorageBackend:
@abstractmethod
async def batch_put(self, objects, importance: int = 0) -> List[ObjectInfo]:
return [self.put(obj, importance) for obj in objects]
If the storage client supports batch.put, it can override this method to get better performance.
Looks reasonable, batch_get may also be useful for other storage backends.
Before this feature is implemented in Mars, Ray should first implement this API in its public releases.
@chaokunyang better to trace related issues or prs in Ray here.
@Catch-Bull Will ray support batch.put? If not, I'll close this issue