sdk
sdk copied to clipboard
TypeError: conv2d() received an invalid combination of argument
Hi,
I want to train using the Torchvision library. I want to watch the training results using the layer library. My training code worked in colab. Marul_Notebook
I get an error when I add the layer library.
import torch
from torch import nn
from torch import optim
from torchvision import transforms,models
from collections import OrderedDict
from layer.decorators import model
import layer
import torchvision
layer.login()
layer.init("marul-classification")
train_transforms = transforms.Compose([transforms.RandomRotation(30),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
])
train_data = torchvision.datasets.ImageFolder(root="./train/",transform=train_transforms)
train_data_loader = torch.utils.data.DataLoader(train_data, batch_size=32, shuffle = True) # num_workers=2 daha sonra dene
dataiter = iter(train_data_loader)
images,labels = dataiter.next()
model = models.densenet121(pretrained=True)
for param in model.parameters():
param.requires_grad = False
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(1024, 512)),
('relu', nn.LeakyReLU()),
('fc2', nn.Linear(512, 3)),
('output', nn.LogSoftmax(dim=1))
]))
model.classifier = classifier
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
criterion = nn.CrossEntropyLoss().to(device)
optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)
@model("my_first_model")
def train_model(model, optimizer, n_epochs, criterion):
import time
start_time = time.time()
for epoch in range(1, n_epochs+1):
epoch_time = time.time()
epoch_loss = 0
correct = 0
total=0
print("Epoch {} / {}".format(epoch, n_epochs))
model.train()
for inputs, labels in train_data_loader:
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad() # zeroed grads
outputs = model(inputs) # forward pass
loss = criterion(outputs, labels) # softmax + cross entropy
loss.backward() # back pass
optimizer.step() # updated params
epoch_loss += loss.item() # train loss
_, pred = torch.max(outputs, dim=1)
correct += (pred.cpu() == labels.cpu()).sum().item()
total += labels.shape[0]
acc = correct / total
model.eval()
a=0
pred_val=0
correct_val=0
total_val=0
with torch.no_grad():
for inp_val, lab_val in train_data_loader:
inp_val = inp_val.to(device)
lab_val = lab_val.to(device)
out_val = model(inp_val)
loss_val = criterion(out_val, lab_val)
a += loss_val.item()
_, pred_val = torch.max(out_val, dim=1)
correct_val += (pred_val.cpu()==lab_val.cpu()).sum().item()
total_val += lab_val.shape[0]
acc_val = correct_val / total_val
epoch_time2 = time.time()
print("Duration: {:.0f}s, Train Loss: {:.4f}, Train Acc: {:.4f}, Val Loss: {:.4f}, Val Acc: {:.4f}"
.format(epoch_time2-epoch_time, epoch_loss/len(labels), acc, a/len(lab_val), acc_val))
end_time = time.time()
print("Total Time:{:.0f}s".format(end_time-start_time))
layer.run([train_model(model, optimizer,50, criterion)])
Error Message:
TypeError: conv2d() received an invalid combination of arguments - got (str, Parameter, NoneType, tuple, tuple, tuple, int), but expected one of:
* (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, tuple of ints padding, tuple of ints dilation, int groups)
didn't match because some of the arguments have invalid types: (str, Parameter, NoneType, tuple, tuple, tuple, int)
* (Tensor input, Tensor weight, Tensor bias, tuple of ints stride, str padding, tuple of ints dilation, int groups)
didn't match because some of the arguments have invalid types: (str, Parameter, NoneType, tuple, tuple, tuple, int)
Why am I getting this error? Could you help?
Hi @kadirnar
I don't think layer
sdk support such use.
layer.run expects a function. What you are passing is a call/invokation of a function. It can work only if your function itself, train_model
, returns a callable.
Can you try to refactor your function to either return a callable or not accept parameters explicitly?
Maybe you can re-write it to something like:
@model
def train_model():
model = // init model
optimizer = // init optimiser
n_epochs = 50
criterion = // criterion
def train_inner(model, optimizer, n_epochs, criterion):
// actual training
return train_inner(model, optimizer, n_epochs, criterion)
layer.run([train_model])
Hi @yuranos ,
Thank you for solving the problem. What should I do to use my gpu? Is GPU support public?
When I add this code to the train function, I get a cuda error.
fabric("f-gpu-small")
@model("my_first_model")
def train_model():
...
layer.run([train_model()])
Error Message:
File "/home/kadir/miniconda3/envs/layer/lib/python3.8/site-packages/torch/cuda/__init__.py", line 211, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
⠙ my_first_model ━━━━━━╸━━━ TRAINING [0:00:20]
My code works on cpu, but I couldn't run it for gpu. Can you help me?
Hey @kadirnar, instead of:
layer.run([train_model()])
can you try:
layer.run([train_model])
When you run train_model()
, it executes that function locally and that seems to be where your code is failing since the error code refers to /home/kadir/
.
layer.init("marul-classification",pip_packages=['torchvision','torch','QuantStub'])
Error Message:
09:12:10 my_first_model: ModuleNotFoundError: No module named 'torchvision'
Hmm, not sure why that's not working, but instead of using pip_packages
on layer.init
can you please try to put it on @model
? Like so:
@fabric("f-gpu-small")
@model("my_first_model", pip_packages=['torchvision','torch','QuantStub'])
def train_model():
...
layer.run([train_model])
Error Message:
@model("my_first_model", pip_packages=['torchvision','torch','QuantStub'])
TypeError: model() got an unexpected keyword argument 'pip_packages'
Solution:
layer.init("marul-classification",pip_packages=['torchvision'])
I want to run the dataset locally.
19:37:40 my_first_model: FileNotFoundError: [Errno 2] No such file or directory: 'train/'
Main: -train(folder) -train.py
Cool, progress! Can you try adding a @resources
decorator? This is documented here: https://docs.app.layer.ai/docs/sdk-library/resources-decorator.
In your case, I think this should work:
@fabric("f-gpu-small")
@model("my_first_model")
@resources("train/")
def train_model():
...
layer.run([train_model])
Thank you, I fixed the error. But I am getting a new error.
⠧ my_first_model ━╸━━━━━━━━ UPLOADING [94/757 files, 37 MB/234 MB, 1.8 MB/s, 0:01:47]
....
.....
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
Can you create documentation for error messages? @mwitiderrick @volkangurel
@kadirnar are you still experiencing this error? @mjbcopland can you take a look?
Hi @kadirnar, does it reliably fail/disconnect in the same place when uploading? Does it work with a smaller subset of the training resources rather than all 757 files?
I no longer use linux operating system. Does it work in windows for the layer library? https://github.com/layerai/sdk/issues/97#issuecomment-1160262151
from layer.decorators import model ... model = models.densenet121(pretrained=True) ... model.classifier = classifier ... model.to(device) ...
@model("my_first_model") ... layer.run([train_model(model, optimizer,50, criterion)])
@kadirnar there is a variable naming issue for model
. You are mixing from layer.decorators.model
with nn.Module
. Thats the reason why you are getting these errors:
09:12:10 my_first_model: ModuleNotFoundError: No module named 'torchvision'
TypeError: model() got an unexpected keyword argument 'pip_packages'
Currently there is only windows operating system. I will reinstall ubuntu to try it. But I don't know the solution of the error. At the moment I am not getting any error about package installation. I am getting connection related error while uploading data.
Note: I don't have the code file due to OS change. I sent the last saved code image file. The latest version of the code: