incubator-myriad icon indicating copy to clipboard operation
incubator-myriad copied to clipboard

MYRIAD-249 Should set NodeManager vcores more flexibly

Open taojieterry opened this issue 8 years ago • 17 comments

add field vcoreMultplier in NodeManagerConfiguration. Then vcore of NodeManager would be cpu in profile muliply vcoreMultifier. eg:

profiles:
  zero:  # NMs launched with this profile dynamically obtain cpu/mem from Mesos
    cpu: 0
    mem: 0
  small:
    cpu: 2
    mem: 2048
  medium:
    cpu: 4
    mem: 4096
  large:
    cpu: 10
    mem: 12288
nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero profile.
  medium: 1 # <profile_name : instances>
rebalancer: false
haEnabled: false
nodemanager:
  jvmMaxMemoryMB: 1024
  cpus: 0.2
  cgroups: false
  vcoreMultiplier: 2

Once flex up medium-size nodemanager, it consumes 4.4 CPUs on mesos, and launched nodemanager has 8 vcores.

taojieterry avatar Nov 29 '16 10:11 taojieterry

I like the idea but think the profiles should reflect the vCores of the NM, so maybe change vcoreMultiplier to vcoreFraction (vcoreRatio??), so a medium size NM would still have 4 vcores but use only 2.2 cpus, easier book-keeping.

Also, you'll need to do some checks for FGS in NMHeartBeatManage and YarnNodeCapacityManager.

DarinJ avatar Nov 29 '16 14:11 DarinJ

@DarinJ ,I don't quite understand that the profiles should reflect the vCores of the NM. Did you mean that we should config it like:

profiles:
  zero:  # NMs launched with this profile dynamically obtain cpu/mem from Mesos
    cpu: 0
    mem: 0
    vcoreRatio: 2
  small:
    cpu: 2
    mem: 2048
    vcoreRatio: 2
  medium:
    cpu: 4
    mem: 4096
    vcoreRatio: 3
  large:
    cpu: 10
    mem: 12288
    vcoreRatio: 3

Should we have different vcoreRatio for each size of NM or a global field is enough? Further more, I have tested it with CGroup disabled. I am not sure it will go well if I turn Cgroup on.

taojieterry avatar Nov 30 '16 10:11 taojieterry

@taojieterry sorry if I wasn't clear, so I think the config should look like

profiles:
  zero:  # NMs launched with this profile dynamically obtain cpu/mem from Mesos
    cpu: 0
    mem: 0
  small:
    cpu: 2
    mem: 2048
  medium:
    cpu: 4
    mem: 4096
  large:
    cpu: 10
    mem: 12288
nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero profile.
  medium: 1 # <profile_name : instances>
rebalancer: false
haEnabled: false
nodemanager:
  jvmMaxMemoryMB: 1024
  cpus: 0.2
  cgroups: false
  vcoreRatio: .5

And if you start a medium profile NM it would launch with 4 vcores and 4096G mem, but only consume 2.2 mesos cpu shares. I think this will eliminate a small amount of confusion vs the original where the NM would have 8 vcores and 4096GB mem.

Make sense?

I think that things should work with cgroups. I can help you with the configuration necessary for testing and can test myself on some aws resources.

DarinJ avatar Nov 30 '16 14:11 DarinJ

@DarinJ thank you for reply and it makes sense to me now. BTW, should we change "cpu" in profiles to "vcore" to make it less confusing? Maybe "cpu" should be still acceptable to keep compatible with earlier config file.

taojieterry avatar Nov 30 '16 15:11 taojieterry

I think you're on the right track

DarinJ avatar Dec 01 '16 22:12 DarinJ

@DarinJ , is there anything else I can do before this patch get merged?

taojieterry avatar Dec 06 '16 01:12 taojieterry

@taojieterry see my comment about mem vs cpu on line 55 of myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/OfferUtils.java. Also, we'll need to address the vcore ratio for fine grained scaling. The files you'll need to modify are incubator-myriad/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/NMHeartBeatHandler.java and incubator-myriad/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/YarnNodeCapacityManager.java.

@yufeldman do you have any comments?

DarinJ avatar Dec 06 '16 15:12 DarinJ

@taojieterry I'll probably be testing you're other two PRs and merging the late this week. I'd like to merge this one at the same time but correctness is more important than expediency.

DarinJ avatar Dec 06 '16 15:12 DarinJ

Sorry for update this so late. I am out of town this week, and the patch will be updated before next Friday.

taojieterry avatar Dec 11 '16 14:12 taojieterry

@taojieterry no worries, looking forward to the updates when they're ready!

DarinJ avatar Dec 12 '16 16:12 DarinJ

@DarinJ , I took a look into NMHeartBeatHandler.java and YarnNodeCapacityManager.java, and found all logic that convert resource from mesos offer to Yarn resource is in OfferUtils.getYarnResourcesFromMesosOffers. It seems to me that there is no need to modify NMHeartBeatHandler.java and YarnNodeCapacityManager.java any more. is it?

taojieterry avatar Dec 14 '16 01:12 taojieterry

@taojieterry I'll take a look to make sure but you're likely correct.

DarinJ avatar Dec 14 '16 04:12 DarinJ

Updated the patch and added some unittests, also I tested manually in my own environment(also tested fgs). @DarinJ please take another look on this patch, thank you in advance.

taojieterry avatar Dec 15 '16 02:12 taojieterry

@taojieterry Will checkout this week. Do you think it's at the point you'd like it to be merged?

DarinJ avatar Dec 20 '16 20:12 DarinJ

Thank you @DarinJ, I think this patch and Myraid-247&250 are ready now. It has been running stably in my environment for a while. However I found it difficult to add unit test for Myriad-250, since we don't have something like mimiCluster to simulate the whole environment. If you have some advise, I would like to pay more effort on this.

taojieterry avatar Dec 21 '16 02:12 taojieterry

Hi @DarinJ , would you mind getting those patches merged to master branch recently? It has been waiting here for a while:)

taojieterry avatar Feb 03 '17 07:02 taojieterry

@taojieterry sorry got caught up in a new job. Will try to find some time this week.

DarinJ avatar Feb 05 '17 14:02 DarinJ