Loss Tensors Are Saved Twice On AWS Pytorch

Open NihalHarish opened this issue 5 years ago • 1 comments

loss functional values are saved twice for each step with AWS Pytorch.

This happens because functional losses are saved by default by the post_hook_for_loss_functional fn in AWS Pytorch.

...
    for _ in range(n_steps):
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = F.cross_entropy(outputs, labels)
        hook.record_tensor_value("nll_loss", tensor_value=loss)
        loss.backward()
        optimizer.step()
...

The post_hook_for_loss_functional is called by F.cross_entropy(outputs, labels)

Apr 09 '20 10:04 NihalHarish

this happens when, like the test https://github.com/awslabs/sagemaker-debugger/blob/master/tests/pytorch/test_loss.py#L61, user makes use of AWS-pytorch, but also modifies the training script to manually save loss.

Apr 20 '20 22:04 vandanavk