volcano
volcano copied to clipboard
A Cloud Native Batch System (Project under CNCF)
When some glitches happends in production env and need to figure out the cause, one option is to change the log level and reproduce the issue to get more verbosed...
Some Vocano users are using Ray (https://github.com/ray-project/ray) at the same time. e.g. Cruise. We want to make Volcano and Ray play together more effectively. - Investigate the integration solution for...
As what [Rescheduling Desgin Doc](https://github.com/volcano-sh/volcano/blob/master/docs/design/rescheduling.md) says and https://github.com/volcano-sh/volcano/pull/2184 implemented currently, all pods will be regared as potential victims. It is not so reasonable for some scenarios. For example, team1 and...
In [v1.6.0](https://github.com/volcano-sh/volcano/releases/tag/v1.6.0), Volcano has supported [Dynamic Scheduling Based on Real Node Load](https://github.com/volcano-sh/volcano/blob/master/docs/design/usage-based-scheduling.md). But there still be some gaps. For example, in production environment, since the load of a Pod does...
related issue #2364 随着queue的配置越来越多,包括guarantee/capacity/weight 以及queue的资源请求request,根据这些配置计算queue.deserved 的逻辑越来越复杂,且可能出现一个queue guarantee很大但weight 这类相互矛盾的情况,queue.deserved 很难通过一个简单的规则来计算了,因此需要提出新的计算逻辑来计算queue.deserved。 guarantee和capacity是硬约束,在满足硬约束的前提下,尽量使得queue.deserved 满足weight比例 和 queue的实际需要/request。一个思路是:queue的deserved 不是由计算得到,而是随机产生(当然要限定在`[guarantee,capacity]` 范围内),将一个计算queue'deserved 的问题转换为 在1k次/1w次随机产生的 queue'deserved 中选择最优的问题。如何判断 deserved1 比deserved2 更好呢?可以借用机器学习中“损失函数值最小”的思路,假设有3个queue 且只考虑cpu 和内存资源,对于一个随机候选 `(queue1_deserved,queue2_deserved,queue3_deserved)`(在机器学习的概念上,实际上一个3*2 tensor),如果能够尽可能贴合 queue的weight `(queue1_weight,queue2_weight,queue3_weight)`,又能尽可能贴合 queue 的request`(queue1_request,queue2_request,queue3_request)`,即可以认为这个候选是最优的。...
**What happened**: I just download the master branh of volcano and install it from helm chart, all of the volcano related pods are running, but when I run the example...
As what [Rescheduling Desgin Doc](https://github.com/volcano-sh/volcano/blob/master/docs/design/rescheduling.md) says and [rescheduling plugin](https://github.com/volcano-sh/volcano/pull/2184) implemented currently, it is a little bit rough to evict selected victims without difference. In fact, we should consider more factors...
#### What would you like to be added: Support flink with Volcano natively. #### Why is this needed: Currently flink use pod level scheduling and also the resoure sharing capability...
resolved #2332 . Signed-off-by: chanhz