zeppelin icon indicating copy to clipboard operation
zeppelin copied to clipboard

[ZEPPELIN-4283] Choose which server to run the interpreter process

Open xunliu opened this issue 5 years ago • 7 comments

What is this PR for?

Currently creating an interpreter in cluster mode, Run on a server that has the least memory resources in all servers in the cluster, However, jdk does not accurately obtain memory usage. So, modify to the average policy and start the same number of interpreters in each server.

Obtaining the operating system's memory usage through the OperatingSystemMXBean is quite different from the actual physical memory usage. The CPU information is accurate.

OperatingSystemMXBean operatingSystemMXBean
    = ManagementFactory.getPlatformMXBean(OperatingSystemMXBean.class);

// Returns the amount of free physical memory in bytes.
long freePhysicalMemorySize = operatingSystemMXBean.getFreePhysicalMemorySize();

// Returns the total amount of physical memory in bytes.
long totalPhysicalMemorySize = operatingSystemMXBean.getTotalPhysicalMemorySize();

// Returns the "recent cpu usage" for the whole system.
double systemCpuLoad = operatingSystemMXBean.getSystemCpuLoad();

int process = Runtime.getRuntime().availableProcessors();

What type of PR is it?

Improvement

What is the Jira issue?

https://issues.apache.org/jira/browse/ZEPPELIN-4283

How should this be tested?

CI Pass

Screenshots (if appropriate)

Questions:

  • Does the licenses files need update? No
  • Is there breaking changes for older versions? No
  • Does this needs documentation? No

xunliu avatar Sep 10 '19 09:09 xunliu

Can anyone help me review the code, Thank you!

xunliu avatar Sep 11 '19 13:09 xunliu

So, modify to the average policy and start the same number of interpreters in each server.

Then it will require all the machines in the cluster to be the same (with same memory and cpu cores), do I understand it correctly ?

zjffdu avatar Sep 11 '19 13:09 zjffdu

Not that, Because java can't get the actual physical memory usage of the server correctly. So now change it, Created on the server with the least number of interpreter processes.

e.g., have A, B, C three zeppelin server, A server: have one interpreter process. B server: have two interpreter process. C server: have two interpreter process.

If need create new interpreter process, The new interpreter processes will create by A server. because A server has the least interpreter processes number.

xunliu avatar Sep 11 '19 14:09 xunliu

This is what I concern. Why do you think choosing A is correct ? IIUC, we have to assume that A,B,C have the same computing resources (cpu cores and memory) and besides that we also assume all the interpreter processes consume the same amount of memory. But I don't think these assumptions are true in reality.

Do you know how much gap between the actual memory and the number calculated by java ? It is weird to me that we can not get the free memory programmatically, it is a very common problem.

zjffdu avatar Sep 11 '19 15:09 zjffdu

The reality is that, The memory deviation obtained by java is very large. Causes the interpreters to be concentrated on one server. Server resource differences generally do not exceed 30%. So scheduling by quantity is better than following the way of resources.

xunliu avatar Sep 12 '19 01:09 xunliu

So scheduling by quantity is better than following the way of resources.

This is based on the 2 assumptions I mentioned above, right ?

zjffdu avatar Sep 12 '19 03:09 zjffdu

In two cases:

  1. Zeppelin cluster + intp on localhsot mode: There are 2 problems.
  • One is that the memory usage obtained by JAVA is very inaccurate. In our production environment, the intp allocation skew was very serious.

  • The second problem, after the intp starts, the user can sometimes closes him, so there is no strict and fair scheduling.

Our production environment is used, and the quantity distribution is the most ideal state.

  1. Zeppelin cluster + intp on docker(or yarn, k8s) mode: Scheduling according to the number of intp is an ideal mode.

xunliu avatar Sep 12 '19 04:09 xunliu