clpy icon indicating copy to clipboard operation
clpy copied to clipboard

Primary machine meets her demise by performing large work

Open vorj opened this issue 6 years ago • 4 comments

We, Fixstars developers, develop ClPy on some machines. Especially, we use the Primary machine (mounting AMD Radeon Vega) and Secondary machine (mounting NVIDIA TITAN V) for now.

However, the GPU driver of the Primary machine meets its demise ~frequently~ when performing large work. We need to push the reset button every time the machine has passed away. We should fix this problem.

Related issue: #108

vorj avatar Feb 28 '19 07:02 vorj

@vorj As you reported on https://github.com/fixstars/clpy/pull/180#issuecomment-464996117 and https://github.com/fixstars/clpy/issues/108#issuecomment-448963157, the problem seems to be caused by perfoming large work, not frequently.

Please report other situation without large proglem if you know. Otherwise do not stretch the problem.

LWisteria avatar Feb 28 '19 09:02 LWisteria

@LWisteria Currently we are tackling to pass some test cases, which contain large works. Therefore, the developers have been clashed by this problem repeatedly. So, I feel like it's occurred frequently . However, the problem has some reproducibility, and basically we can avoid it by taking care for executing the test cases. Additionally, current CI task don't contain problematic large works, so we don't hit the problem always .

Anyway, the description is not good (because it is based on my feelings), so I'll fix it.

vorj avatar Feb 28 '19 09:02 vorj

@vorj we must talk and discuss about technology and engineering, never your spiritual feeling.

LWisteria avatar Mar 01 '19 02:03 LWisteria

@yuk-to I hear you investigated this problem. Is this caused by ClPy or machine specific? The old primary machine (furyx) wont' die even if the work load is heavy. We can close this issue if this is not by ClPy itself. I need you to report

LWisteria avatar Mar 05 '19 02:03 LWisteria