Thor Wu

Results 169 comments of Thor Wu

@haiker2011 Thanks for your report. So it's not so robust that current logic are not aware of why the job is restarted(due to mannual operation or job failed), right?

@ldd91 Kindly to ask is there some process about this issue?

@sunyulin728 Hi, I think I'm aware of your scenario now. And I stand on your side dealing with the rationality of the requirement. Unfortunately, the `binpack` plugin only consider the...

Sounds interesting!But it's maybe more complex than the given desgin. For example, network delay varies between different nodes. Also, it varies in different period to the same node. Maybe considering...

Thanks for your report and debug. The debug is meaningful and we will fix it as soon as possible.

Request more voice about how much should be considered as a block(default is 1M) which is suitable for all specified GPU cards.

> 100MB per block may work fine. Inference services usually cost hundreds to thousands MB memory(train services usually cost much more than this scale), so we actually do not care...

> Is this issue resolved at present? Not yet. We are considering for a graceful way to make the fix without modifing the gRPC directly.

> Any update for this issue? Not yet now. I'm sorry for developing another feature recently. Will fix that ASAP.