pytorch_geometric icon indicating copy to clipboard operation
pytorch_geometric copied to clipboard

scipy.sparse.linalg._eigen.arpack.arpack.ArpackError

Open kou18n opened this issue 2 years ago • 6 comments

🐛 Describe the bug

transform = T.Compose([T.RemoveIsolatedNodes() ,T.AddSelfLoops(), T.AddRandomWalkPE(20,attr_name='x'), T.ToSparseTensor()])

T.AddRandomWalkPE() transform works good, when I train the GIN model.

transform = T.Compose([T.RemoveIsolatedNodes() ,T.AddSelfLoops(), T.AddLaplacianEigenvectorPE(3,attr_name='x'), T.ToSparseTensor()])

But, T.AddLaplacianEigenvectorPE got the following errors.

Please help me, thank you!

Namespace(dataset='MalNetTiny', batch_size=256, hidden_channels=64, num_layers=5, lr=0.0001, epochs=500, wandb='True', transform='LEPE')
cuda
Use LEPE node feautre!
Traceback (most recent call last):
  File "/home/xxx/Projects/new_ideas/main.py", line 121, in <module>
    model = Net(train_dataset.num_features, args.hidden_channels, train_dataset.num_classes, args.num_layers).to(device)
  File "/home/xxx/miniconda3/envs/malnet/lib/python3.10/site-packages/torch_geometric/data/in_memory_dataset.py", line 66, in num_classes
    return super().num_classes
  File "/home/xxx/miniconda3/envs/malnet/lib/python3.10/site-packages/torch_geometric/data/dataset.py", line 159, in num_classes
    data_list = _get_flattened_data_list([data for data in self])
  File "/home/xxx/miniconda3/envs/malnet/lib/python3.10/site-packages/torch_geometric/data/dataset.py", line 159, in <listcomp>
    data_list = _get_flattened_data_list([data for data in self])
  File "/home/xxx/miniconda3/envs/malnet/lib/python3.10/site-packages/torch_geometric/data/dataset.py", line 259, in __getitem__
    data = data if self.transform is None else self.transform(data)
  File "/home/xxx/miniconda3/envs/malnet/lib/python3.10/site-packages/torch_geometric/transforms/compose.py", line 24, in __call__
    data = transform(data)
  File "/home/xxx/miniconda3/envs/malnet/lib/python3.10/site-packages/torch_geometric/transforms/add_positional_encoding.py", line 79, in __call__
    eig_vals, eig_vecs = eig_fn(
  File "/home/xxx/miniconda3/envs/malnet/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 1354, in eigs
    return params.extract(return_eigenvectors)
  File "/home/xxx/miniconda3/envs/malnet/lib/python3.10/site-packages/scipy/sparse/linalg/_eigen/arpack/arpack.py", line 782, in extract
    raise ArpackError(ierr, infodict=self.extract_infodict)
scipy.sparse.linalg._eigen.arpack.arpack.ArpackError: ARPACK error 1: The Schur form computed by LAPACK routine slahqr could not be reordered by LAPACK routine strsen . Re-enter subroutine dneupd  with IPARAM(5)=NCV and increase the size of the arrays DR and DI to have dimension at least dimension NCV and allocate at least NCV columns for Z. NOTE: Not necessary if Z and V share the same space. Please notify the authors if this error occurs.

Environment

No response

kou18n avatar Apr 15 '23 07:04 kou18n

Sorry for late response. Do you have a minimal example to reproduce? This may look dataset specific as I cannot reproduce it in a test.

rusty1s avatar Apr 19 '23 14:04 rusty1s

Sorry for late response. Do you have a minimal example to reproduce? This may look dataset specific as I cannot reproduce it in a test.

Thank you for your response. I give you a minimal example.

First Net model get following errors:

TypeError: Cannot use scipy.linalg.eig for sparse A with k >= N - 1. Use scipy.linalg.eig(A.toarray()) or reduce k.

Second Net model get following errors:

ArpackError: ARPACK error 1: The Schur form computed by LAPACK routine slahqr could not be reordered by LAPACK routine strsen . Re-enter subroutine dneupd with IPARAM(5)=NCV and increase the size of the arrays DR and DI to have dimension at least dimension NCV and allocate at least NCV columns for Z. NOTE: Not necessary if Z and V share the same space. Please notify the authors if this error occurs.


import torch
from torch_geometric.datasets import MalNetTiny
import torch.nn.functional as F
from torch.nn import BatchNorm1d, Linear, ReLU, Sequential

from torch_geometric.loader import DataLoader
from torch_geometric.logging import init_wandb, log
from torch_geometric.nn import MLP, GINConv, global_add_pool
from torch.nn import BatchNorm1d as BatchNorm

import torch_geometric.transforms as T
from torch_scatter import segment_csr

from torch_geometric.data import Data
from torch_geometric.testing import onlyLinux
from torch_geometric.transforms import (
    AddLaplacianEigenvectorPE,
    AddRandomWalkPE,
    LocalDegreeProfile,
)

# AddLaplacianEigenvectorPE does not work
transform = T.Compose([T.RemoveIsolatedNodes() ,T.AddSelfLoops(), T.AddLaplacianEigenvectorPE(3,attr_name='x'),T.ToSparseTensor()])
# AddRandomWalkPE works
#transform = T.Compose([T.RemoveIsolatedNodes() ,T.AddSelfLoops(), T.AddRandomWalkPE(5,attr_name='x'),T.ToSparseTensor()])

train_dataset = MalNetTiny(root='/home/xxx/Datasets/MalNetTiny', split='train', transform=transform)
val_dataset = MalNetTiny(root='/home/xxx/Datasets/MalNetTiny', split='val', transform=transform)
test_dataset = MalNetTiny(root='/home/xxx/Datasets/MalNetTiny', split='test', transform=transform)

train_loader = DataLoader(train_dataset, 256, shuffle=True, pin_memory=True)
val_loader = DataLoader(val_dataset, 256, shuffle=False)
test_loader = DataLoader(test_dataset, 256, shuffle=False)

# class Net(torch.nn.Module):
#     def __init__(self, in_channels, hidden_channels, out_channels, num_layers):
#         super().__init__()

#         self.convs = torch.nn.ModuleList()
#         self.batch_norms = torch.nn.ModuleList()

#         for i in range(num_layers):
#             mlp = Sequential(
#                 Linear(in_channels, 2 * hidden_channels),
#                 BatchNorm(2 * hidden_channels),
#                 ReLU(),
#                 Linear(2 * hidden_channels, hidden_channels),
#             )
#             conv = GINConv(mlp, train_eps=False).jittable()

#             self.convs.append(conv)
#             self.batch_norms.append(BatchNorm(hidden_channels))

#             in_channels = hidden_channels

#         self.lin1 = Linear(hidden_channels, hidden_channels)
#         self.batch_norm1 = BatchNorm(hidden_channels)
#         self.lin2 = Linear(hidden_channels, out_channels)

#     def forward(self, x, adj_t, batch):
#         for conv, batch_norm in zip(self.convs, self.batch_norms):
#             x = F.relu(batch_norm(conv(x, adj_t)))
        
#         #x = global_add_pool(x, batch)
#         x = segment_csr(x, batch) 

#         x = F.relu(self.batch_norm1(self.lin1(x)))
#         x = F.dropout(x, p=0.5, training=self.training)
#         x = self.lin2(x)
#         return F.log_softmax(x, dim=-1)

class Net(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, num_layers):
        super().__init__()

        self.convs = torch.nn.ModuleList()
        for _ in range(num_layers):
            mlp = MLP([in_channels, hidden_channels, hidden_channels])
            self.convs.append(GINConv(nn=mlp, train_eps=False))
            in_channels = hidden_channels

        self.mlp = MLP([hidden_channels, hidden_channels, out_channels],
                       norm=None, dropout=0.5)

    def forward(self, x, edge_index, batch):
        for conv in self.convs:
            x = conv(x, edge_index).relu()
        x = segment_csr(x, batch)
        return self.mlp(x)


num_classes = 5    
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = Net(train_dataset.num_features, 64, num_classes, 5).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.0001)


def train():
    model.train()

    total_loss = 0
    for data in train_loader:
        data = data.to(device)
        optimizer.zero_grad()
        out = model(data.x, data.adj_t, data.ptr)
        loss = F.cross_entropy(out, data.y)
        loss.backward()
        optimizer.step()
        total_loss += float(loss) * data.num_graphs
    return total_loss / len(train_loader.dataset)

@torch.no_grad()
def test(loader):
    model.eval()

    total_correct = 0
    for data in loader:
        data = data.to(device)
        pred = model(data.x, data.adj_t, data.ptr).argmax(dim=-1)
        total_correct += int((pred == data.y).sum())
    return total_correct / len(loader.dataset)



for epoch in range(1, 5 + 1):
    loss = train()
    test_acc = test(test_loader)
    log(Epoch=epoch, Loss=loss,Test=test_acc)



kou18n avatar Apr 24 '23 10:04 kou18n

I also have a lightning code.

import os.path as osp

import pytorch_lightning as pl
import torch
import torch.nn.functional as F
from torchmetrics import Accuracy

import torch_geometric.transforms as T
from torch_geometric.data.lightning import LightningDataset
from torch_geometric.datasets import TUDataset
from torch_geometric.nn import GIN, MLP

from torch_geometric.datasets import MalNetTiny
import torch_geometric.transforms as T
from torch_scatter import segment_csr

from pytorch_lightning import seed_everything

torch.set_float32_matmul_precision('high')
import warnings
warnings.filterwarnings('ignore', category=UserWarning, message='TypedStorage is deprecated')
from torch.nn import Linear, ReLU, Sequential
from torch.nn import BatchNorm1d as BatchNorm
from torch_geometric.nn import GINConv



class Model(pl.LightningModule):
    def __init__(self, in_channels: int, out_channels: int,
                 hidden_channels: int = 64, num_layers: int = 5,
                 dropout: float = 0.5):
        super().__init__()

        self.convs = torch.nn.ModuleList()
        self.batch_norms = torch.nn.ModuleList()

        for i in range(num_layers):
            mlp = Sequential(
                Linear(in_channels, 2 * hidden_channels),
                BatchNorm(2 * hidden_channels),
                ReLU(),
                Linear(2 * hidden_channels, hidden_channels),
            )
            conv = GINConv(mlp, train_eps=False)#.jittable()

            self.convs.append(conv)
            self.batch_norms.append(BatchNorm(hidden_channels))

            in_channels = hidden_channels

        self.lin1 = Linear(hidden_channels, hidden_channels)
        self.batch_norm1 = BatchNorm(hidden_channels)
        self.lin2 = Linear(hidden_channels, out_channels)

        self.train_acc = Accuracy(task='multiclass', num_classes=out_channels)
        self.val_acc = Accuracy(task='multiclass', num_classes=out_channels)
        self.test_acc = Accuracy(task='multiclass', num_classes=out_channels)

    def forward(self, x, adj_t, batch):
        for conv, batch_norm in zip(self.convs, self.batch_norms):
            x = F.relu(batch_norm(conv(x, adj_t)))
        
        #x = global_add_pool(x, batch)
        x = segment_csr(x, batch) 

        x = F.relu(self.batch_norm1(self.lin1(x)))
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.lin2(x)
        return F.log_softmax(x, dim=-1)



    def training_step(self, data, batch_idx):
        y_hat = self(data.x, data.adj_t, data.ptr)
        loss = F.cross_entropy(y_hat, data.y)
        self.train_acc(y_hat.softmax(dim=-1), data.y)
        self.log('train_acc', self.train_acc, prog_bar=True, on_step=False,
                 on_epoch=True)
        return loss

    def validation_step(self, data, batch_idx):
        y_hat = self(data.x, data.adj_t, data.ptr)
        self.val_acc(y_hat.softmax(dim=-1), data.y)
        self.log('val_acc', self.val_acc, prog_bar=True, on_step=False,
                 on_epoch=True)

    def test_step(self, data, batch_idx):
        y_hat = self(data.x, data.adj_t, data.ptr)
        self.test_acc(y_hat.softmax(dim=-1), data.y)
        self.log('test_acc', self.test_acc, prog_bar=True, on_step=False,
                 on_epoch=True)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.0001)


if __name__ == '__main__':

    seed_everything(0, workers=True)
    #transform = T.Compose([T.RemoveIsolatedNodes() ,T.AddSelfLoops(), T.LocalDegreeProfile(), T.ToSparseTensor()])
    transform = T.Compose([T.RemoveIsolatedNodes() ,T.AddSelfLoops(), T.AddLaplacianEigenvectorPE(3,attr_name='x'), T.ToSparseTensor()])


    train_dataset = MalNetTiny(root='/home/xxx/Datasets/MalNetTiny', split='train', transform=transform).shuffle()
    val_dataset = MalNetTiny(root='/home/xxx/Datasets/MalNetTiny', split='val', transform=transform)
    test_dataset = MalNetTiny(root='/home/xxx/Datasets/MalNetTiny', split='test', transform=transform)



    datamodule = LightningDataset(train_dataset, val_dataset, test_dataset,
                                  batch_size=256, num_workers=14)

    model = Model(test_dataset.num_node_features, test_dataset.num_classes)
    print(model)
    devices = torch.cuda.device_count()
    strategy = pl.strategies.DDPStrategy(accelerator='gpu')
    checkpoint = pl.callbacks.ModelCheckpoint(monitor='val_acc', save_top_k=1,
                                              mode='max')
    trainer = pl.Trainer(strategy=strategy, devices=devices, max_epochs=500,
                         log_every_n_steps=5, callbacks=[checkpoint],deterministic=True)

    trainer.fit(model, datamodule)
    trainer.test(ckpt_path='best', datamodule=datamodule)

errors: scipy.sparse.linalg._eigen.arpack.arpack.ArpackError: ARPACK error 1: The Schur form computed by LAPACK routine slahqr could not be reordered by LAPACK routine strsen . Re-enter subroutine dneupd with IPARAM(5)=NCV and increase the size of the arrays DR and DI to have dimension at least dimension NCV and allocate at least NCV columns for Z. NOTE: Not necessary if Z and V share the same space. Please notify the authors if this error occurs.

kou18n avatar Apr 24 '23 11:04 kou18n

Thanks for the example. I can reproduce it on example 957:

Data(edge_index=[2, 109], y=[1], num_nodes=51)

I am not totally sure why this happens though, but it seems more related to an issue in scipy?

rusty1s avatar Apr 28 '23 13:04 rusty1s

Yes, I think so. I have no idea to solve this issue. arpack error 3 and arpack error 3: no shifts could be applied during a cycle of the implicitly restarted arnoldi iteration.

kou18n avatar May 01 '23 07:05 kou18n

I reproduced the same error. However if you convert L to dense, it works. So maybe a try/ except to switch to dense matrix computations. Seems like some convergence issue for Arnoldi iterations and/or the matrix is ill conditioned.

arijitthegame avatar Aug 15 '25 21:08 arijitthegame