intel-extension-for-pytorch icon indicating copy to clipboard operation
intel-extension-for-pytorch copied to clipboard

RuntimeError: 0 <= device.index() && device.index() < static_cast<c10::DeviceIndex>(device_ready_queues_.size()) INTERNAL ASSERT FAILED at "/build/pytorch/torch/csrc/autograd/engine.cpp":1418

Open SoldierWz opened this issue 11 months ago • 3 comments

Describe the bug

When this problem occurred, I tried to disable the CPU core, and then I could run normally, but the running results were very poor, the accuracy dropped sharply and the training time became longer. I have submitted this issue #565. Then when I restored the CPU core, the above error occurred. Here is the part where the problem occurs. device = 'xpu' for train_idx, test_idx in kf.split(X_tensor): X_train, X_test = X_tensor[train_idx], X_tensor[test_idx] y_train, y_test = y_tensor[train_idx], y_tensor[test_idx]

train_dataset = CustomDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

model = MLP(X_train.shape[1]) 
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

model = model.to("xpu")
criterion = criterion.to("xpu")
model, optimizer = ipex.optimize(model, optimizer=optimizer)
for epoch in range(1000):
    model.train() 
    for features, labels in train_loader:
        features, labels = features.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(features)
        loss = criterion(outputs, labels)
        **loss.backward()**
        optimizer.step()

Versions

wget https://raw.githubusercontent.com/intel/intel-extension-for-pytorch/master/scripts/collect_env.py

For security purposes, please check the contents of collect_env.py before running it.

python collect_env.py

SoldierWz avatar Mar 26 '24 06:03 SoldierWz