magpie icon indicating copy to clipboard operation
magpie copied to clipboard

Re-visit estimates on thread counts, process limits, file descriptor limits, etc.

Open chu11 opened this issue 7 years ago • 2 comments

Newer systems may have hyper threading on and core counts are now much larger than before. Estimates for the number of threads daemons should have to handle communications (for example NameNode, DataNode, etc.) were previously estimates based on node count. These estimates may now be out of date and need to be calculated differently. Revisit calculations for these estimates.

In addition, max number of tasks (such as in Hadoop or Spark) may also need to be re-estimated. While in the past it may have been common to have 8-24 cores, with hyper threading 48-64 is not that unreasonable. The trade off of more threads/tasks may no longer be balanced in favor of big data applications. Re-consider how max threads/tasks per node is determined.

chu11 avatar Mar 25 '17 00:03 chu11

In addition, prior estimates on process fd limits, process limits, etc. may need to be reconsidered

chu11 avatar Apr 18 '17 23:04 chu11

In addition, default reducer counts in Terasort. Is 2 per node a reasonable one anymore?

chu11 avatar Apr 26 '17 14:04 chu11