sagemaker-debugger icon indicating copy to clipboard operation
sagemaker-debugger copied to clipboard

Error while running sagemaker-debugger with custom pytorch container and custom model

Open aditya5558 opened this issue 3 years ago • 0 comments

Hi,

I am running into the error below while running sagemaker-debugger with a custom pytorch container and custom model without sagemaker training. I added hooks to my model and loss using the below statements and tried running my training code but I am running into this error:

FileNotFoundError: [Errno 2] No such file or directory: 'smdebug_outputs/collections/000000000/worker_0_collections.json.tmp'

where 'smdebug_outputs' is the output directory given.

I inserted the following snippet in my code for inserting hooks:

import smdebug.pytorch as smd
hook = smd.Hook(out_dir)
hook.register_module(net)

# Inside training loop
loss = net(inputs)
hook.record_tensor_value(tensor_name="loss", tensor_value=loss)

Is there some other modifications needed to get sagemaker-debugger running on a custom model and container?

aditya5558 avatar Mar 29 '21 21:03 aditya5558