clpy
clpy copied to clipboard
Primary machine meets her demise by performing large work
We, Fixstars developers, develop ClPy on some machines. Especially, we use the Primary machine (mounting AMD Radeon Vega) and Secondary machine (mounting NVIDIA TITAN V) for now.
However, the GPU driver of the Primary machine meets its demise ~frequently~ when performing large work. We need to push the reset button every time the machine has passed away. We should fix this problem.
Related issue: #108
@vorj As you reported on https://github.com/fixstars/clpy/pull/180#issuecomment-464996117 and https://github.com/fixstars/clpy/issues/108#issuecomment-448963157, the problem seems to be caused by perfoming large work, not frequently.
Please report other situation without large proglem if you know. Otherwise do not stretch the problem.
@LWisteria Currently we are tackling to pass some test cases, which contain large works. Therefore, the developers have been clashed by this problem repeatedly. So, I feel like it's occurred frequently . However, the problem has some reproducibility, and basically we can avoid it by taking care for executing the test cases. Additionally, current CI task don't contain problematic large works, so we don't hit the problem always .
Anyway, the description is not good (because it is based on my feelings), so I'll fix it.
@vorj we must talk and discuss about technology and engineering, never your spiritual feeling.
@yuk-to I hear you investigated this problem. Is this caused by ClPy or machine specific? The old primary machine (furyx) wont' die even if the work load is heavy. We can close this issue if this is not by ClPy itself. I need you to report