magpie
magpie copied to clipboard
Re-visit estimates on thread counts, process limits, file descriptor limits, etc.
Newer systems may have hyper threading on and core counts are now much larger than before. Estimates for the number of threads daemons should have to handle communications (for example NameNode, DataNode, etc.) were previously estimates based on node count. These estimates may now be out of date and need to be calculated differently. Revisit calculations for these estimates.
In addition, max number of tasks (such as in Hadoop or Spark) may also need to be re-estimated. While in the past it may have been common to have 8-24 cores, with hyper threading 48-64 is not that unreasonable. The trade off of more threads/tasks may no longer be balanced in favor of big data applications. Re-consider how max threads/tasks per node is determined.
In addition, prior estimates on process fd limits, process limits, etc. may need to be reconsidered
In addition, default reducer counts in Terasort. Is 2 per node a reasonable one anymore?