dlwpt-code
dlwpt-code copied to clipboard
p1ch8/1_convolution.ipynb L2 regularization problem
I've been working my way through the Jupyter Notebook for Chapter 8.
When I run the cell that trains using L2 regularization
model = Net().to(device=device)
optimizer = optim.SGD(model.parameters(), lr=1e-2)
loss_fn = nn.CrossEntropyLoss()
training_loop_l2reg(
n_epochs = 100,
optimizer = optimizer,
model = model,
loss_fn = loss_fn,
train_loader = train_loader,
)
all_acc_dict["l2 reg"] = validate(model, train_loader, val_loader)
The network will not train since the loss is 'nan'. I am curious if there is an error in the definition of training_loop_l2reg in the previous cell:
def training_loop_l2reg(n_epochs, optimizer, model, loss_fn,
train_loader):
for epoch in range(1, n_epochs + 1):
loss_train = 0.0
for imgs, labels in train_loader:
imgs = imgs.to(device=device)
labels = labels.to(device=device)
outputs = model(imgs)
loss = loss_fn(outputs, labels)
l2_lambda = 0.001
# Replace pow(2.0) with abs() for L1 regularization
l2_norm = sum(p.pow(2.0).sum()
for p in model.parameters())
loss = loss + l2_lambda * l2_norm
optimizer.zero_grad()
loss.backward()
optimizer.step()
loss_train += loss.item()
if epoch == 1 or epoch % 10 == 0:
print('{} Epoch {}, Training loss {}'.format(
datetime.datetime.now(), epoch,
loss_train / len(train_loader)))
Since if I instead train using the weight_decay parameter in SGD instead:
model = NetWidth(n_chans1=32).to(device=device)
optimizer = optim.SGD(model.parameters(), weight_decay=0.001, lr=1e-2)
loss_fn = nn.CrossEntropyLoss()
training_loop(
n_epochs = 100,
optimizer = optimizer,
model = model,
loss_fn = loss_fn,
train_loader = train_loader,
)
all_acc_dict["width"] = validate(model, train_loader, val_loader)
I have no problem with the loss converging.