sagemaker-debugger
sagemaker-debugger copied to clipboard
Error while running sagemaker-debugger with custom pytorch container and custom model
Hi,
I am running into the error below while running sagemaker-debugger with a custom pytorch container and custom model without sagemaker training. I added hooks to my model and loss using the below statements and tried running my training code but I am running into this error:
FileNotFoundError: [Errno 2] No such file or directory: 'smdebug_outputs/collections/000000000/worker_0_collections.json.tmp'
where 'smdebug_outputs' is the output directory given.
I inserted the following snippet in my code for inserting hooks:
import smdebug.pytorch as smd
hook = smd.Hook(out_dir)
hook.register_module(net)
# Inside training loop
loss = net(inputs)
hook.record_tensor_value(tensor_name="loss", tensor_value=loss)
Is there some other modifications needed to get sagemaker-debugger running on a custom model and container?