akii96
akii96
Hi! @Jacob0226 You need to pass something nonzero atleast for "iters" argument (otherwise a div by 0 leads to inf that you see). I would also pass some nonzero value...
Hi @amazingguni just a sanity check and check for our understanding of the issue, does the checkpoint/restore functionality work when you have a container with no gpu devices whatsoever? I...
Thanks for the additional context @amazingguni 👍 Can you still confirm for us that the checkpoint/restore functionality works when you have a container with **no gpu devices** in your demo-report.yaml...
@amazingguni sorry took some back to get some answers from colleagues but yeah testing the no gpu container would be difficult but I got a second ask from the team,...
I do not have access to gpu's with elevated privileges to test this myself (for ex to apply the driver patch :/ ) so I am completely relying on our...