Limit number of partitions that data chunks of a "large" blob can reside on

Open vgkholla opened this issue 7 years ago • 0 comments

Currently Ambry sprays the data chunks of a given "large" blob across all writable partitions (i.e. any partition is eligible to receive a data chunk). This can create some problems

If one chunk is lost, the whole blob is considered lost. Spraying it across multiple partitions spread over multiple nodes increases the area of failure
One disk being slow/unavailable can have a multiplying effect if it stores a single chunk of many large blobs

In order to restrict these effects, it may be useful to limit the number of partitions the data chunks of a particular "large" blob can reside on. At current time, GET operations request 4 chunks in parallel and maintain the number until the blob is currently served. It may be enough to have 4 * x partitions receive the data chunks (4,8,12 etc) to serve any parallelism requirements.

The partition picking can be random to start with and can be made more intelligent based on requirements (which need to be evaluated)

Pick from distinct disks and nodes
Pick from distinct disks but limit number of nodes across which these disks are spread (to lessen impact of node failures)
... Each of the options comes with its own upsides and downsides.

Mar 22 '18 23:03 vgkholla