incubator-myriad
incubator-myriad copied to clipboard
MYRIAD-249 Should set NodeManager vcores more flexibly
add field vcoreMultplier in NodeManagerConfiguration. Then vcore of NodeManager would be cpu in profile muliply vcoreMultifier. eg:
profiles:
zero: # NMs launched with this profile dynamically obtain cpu/mem from Mesos
cpu: 0
mem: 0
small:
cpu: 2
mem: 2048
medium:
cpu: 4
mem: 4096
large:
cpu: 10
mem: 12288
nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero profile.
medium: 1 # <profile_name : instances>
rebalancer: false
haEnabled: false
nodemanager:
jvmMaxMemoryMB: 1024
cpus: 0.2
cgroups: false
vcoreMultiplier: 2
Once flex up medium-size nodemanager, it consumes 4.4 CPUs on mesos, and launched nodemanager has 8 vcores.
I like the idea but think the profiles should reflect the vCores of the NM, so maybe change vcoreMultiplier to vcoreFraction (vcoreRatio??), so a medium size NM would still have 4 vcores but use only 2.2 cpus, easier book-keeping.
Also, you'll need to do some checks for FGS in NMHeartBeatManage and YarnNodeCapacityManager.
@DarinJ ,I don't quite understand that the profiles should reflect the vCores of the NM. Did you mean that we should config it like:
profiles:
zero: # NMs launched with this profile dynamically obtain cpu/mem from Mesos
cpu: 0
mem: 0
vcoreRatio: 2
small:
cpu: 2
mem: 2048
vcoreRatio: 2
medium:
cpu: 4
mem: 4096
vcoreRatio: 3
large:
cpu: 10
mem: 12288
vcoreRatio: 3
Should we have different vcoreRatio for each size of NM or a global field is enough? Further more, I have tested it with CGroup disabled. I am not sure it will go well if I turn Cgroup on.
@taojieterry sorry if I wasn't clear, so I think the config should look like
profiles:
zero: # NMs launched with this profile dynamically obtain cpu/mem from Mesos
cpu: 0
mem: 0
small:
cpu: 2
mem: 2048
medium:
cpu: 4
mem: 4096
large:
cpu: 10
mem: 12288
nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero profile.
medium: 1 # <profile_name : instances>
rebalancer: false
haEnabled: false
nodemanager:
jvmMaxMemoryMB: 1024
cpus: 0.2
cgroups: false
vcoreRatio: .5
And if you start a medium profile NM it would launch with 4 vcores and 4096G mem, but only consume 2.2 mesos cpu shares. I think this will eliminate a small amount of confusion vs the original where the NM would have 8 vcores and 4096GB mem.
Make sense?
I think that things should work with cgroups. I can help you with the configuration necessary for testing and can test myself on some aws resources.
@DarinJ thank you for reply and it makes sense to me now. BTW, should we change "cpu" in profiles to "vcore" to make it less confusing? Maybe "cpu" should be still acceptable to keep compatible with earlier config file.
I think you're on the right track
@DarinJ , is there anything else I can do before this patch get merged?
@taojieterry see my comment about mem vs cpu on line 55 of myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/OfferUtils.java. Also, we'll need to address the vcore ratio for fine grained scaling. The files you'll need to modify are incubator-myriad/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/NMHeartBeatHandler.java and incubator-myriad/myriad-scheduler/src/main/java/org/apache/myriad/scheduler/fgs/YarnNodeCapacityManager.java.
@yufeldman do you have any comments?
@taojieterry I'll probably be testing you're other two PRs and merging the late this week. I'd like to merge this one at the same time but correctness is more important than expediency.
Sorry for update this so late. I am out of town this week, and the patch will be updated before next Friday.
@taojieterry no worries, looking forward to the updates when they're ready!
@DarinJ , I took a look into NMHeartBeatHandler.java and YarnNodeCapacityManager.java, and found all logic that convert resource from mesos offer to Yarn resource is in OfferUtils.getYarnResourcesFromMesosOffers. It seems to me that there is no need to modify NMHeartBeatHandler.java and YarnNodeCapacityManager.java any more. is it?
@taojieterry I'll take a look to make sure but you're likely correct.
Updated the patch and added some unittests, also I tested manually in my own environment(also tested fgs). @DarinJ please take another look on this patch, thank you in advance.
@taojieterry Will checkout this week. Do you think it's at the point you'd like it to be merged?
Thank you @DarinJ, I think this patch and Myraid-247&250 are ready now. It has been running stably in my environment for a while. However I found it difficult to add unit test for Myriad-250, since we don't have something like mimiCluster to simulate the whole environment. If you have some advise, I would like to pay more effort on this.
Hi @DarinJ , would you mind getting those patches merged to master branch recently? It has been waiting here for a while:)
@taojieterry sorry got caught up in a new job. Will try to find some time this week.