elephant-bird icon indicating copy to clipboard operation
elephant-bird copied to clipboard

Refine implementation in SplitUtils -- better locality awareness

Open jcoveney opened this issue 10 years ago • 3 comments

This is a refinement of #398

We want to make sure that empty splits are not dropped, but more radically, we want better incorporation of locality. @gerashegalov, @sjlee, would love your comments and ideas.

Ideally, #401 would have been completed and we would have more robust regression testing.

jcoveney avatar May 28 '14 23:05 jcoveney

I assume you want me to elaborate on my idea in #398. The current algorithm computes top 5 locations for the combined split using the number of splits location x occurred in the "real" splits being combined. I suggest to sort locations based on cumulative split.getLength() instead to accommodate for splits of greatly varying sizes.

To illustrate the point in its extreme, consider location1 with 1000 0-byte splits and location2 1 with 512mb-split. Location2 should be preferred as the location of the combined split. To break the tie between locations contributing the same number of bytes, the number of splits should be considered as the minor sort key. That will work fine with empty splits as well. (x1 bytes, y1 splits) <= (x2 bytes, y2 splits) if (x1 < x2 || x1 == x2 && y1 <= y2)

gerashegalov avatar Jun 01 '14 10:06 gerashegalov

Then again: http://www.cs.berkeley.edu/~ganesha/disk-irrelevant_hotos2011.pdf

Do we need to actually worry about this?

dvryaboy avatar Jun 04 '14 09:06 dvryaboy

There is this trend indeed. However ... We are talking about a general notion of locality. What location means in hadoop will also follow this trend. E.g. we could have whole cache hierarchy represented via the hadoop tree topology /dc/rack/hdd/ssd/ram/l2/l1. If the data on hdd's is replicated we can refine it by splitting into inner/outer tracks. In current data centers the classical disk locality matters. It is well the case that for some systems with infiniband even now it does not matter, we can ditch rack from the hierarchy of racks connected by infiniband. Then we would argue that it's not smart to run the a job across dc's, so we would ditch this as well. However, the hierarchy can be deeper than this even for this case.

gerashegalov avatar Jun 05 '14 20:06 gerashegalov