spikingjelly icon indicating copy to clipboard operation
spikingjelly copied to clipboard

关于利用STDP优化器与梯度下降混合使用无法分类的问题

Open Wang-jun-yu opened this issue 1 year ago • 7 comments

你好,我利用STDP优化器试图训练我自己的数据集,但遇到了无法分类的问题,请问可以帮我看看问题出在哪里吗? 以下为输出结果: epoch : 0 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 1 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 2 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 3 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 4 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 5 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 (由于我数据集一共分为十类,测试准确率只有10%,可以认为无法分类) 我的网络结构为: class CSNN(nn.ModuleList): def init(self, T: int, channels: int, use_cupy=False): super(CSNN, self).init() self.T = T self.conv_fc = nn.Sequential( layer.Conv2d(3, channels, kernel_size=3, padding=1, bias=False), layer.BatchNorm2d(channels),
neuron.IFNode(surrogate_function=surrogate.ATan()), layer.MaxPool2d(2, 2), # 14 * 14

        layer.Conv2d(channels, channels, kernel_size=3, padding=1, bias=False),
        layer.BatchNorm2d(channels),
        neuron.IFNode(surrogate_function=surrogate.ATan()),
        layer.MaxPool2d(2, 2),  # 7 * 7

        layer.Flatten(),
        layer.Linear(channels * 32 * 32, channels * 4 * 4, bias=False),
        neuron.IFNode(surrogate_function=surrogate.ATan()),

        layer.Linear(channels * 4 * 4, 10, bias=False),
        neuron.IFNode(surrogate_function=surrogate.ATan()),
    )
    functional.set_step_mode(self, step_mode='m')
def __len__(self):
    return len(self.conv_fc)
def forward(self,x):
    x_seq = x.unsqueeze(0).repeat(self.T, 1, 1, 1, 1) # [N, C, H, W] -> [T, N, C, H, W]
    x_seq = self.conv_fc(x_seq)
    fr = x_seq.mean(0)
    return fr
    
    

训练过程为: start_epoch = 0 instances_stdp = (la.Conv2d,) params_stdp = []

for m in net.modules():
    print('m : ',m)
    if isinstance(m, instances_stdp):
        print('instances_stdp : ',instances_stdp)
        for p in m.parameters():
            params_stdp.append(p)
params_stdp_set = set(params_stdp)
params_gradient_descent = []


for p in net.parameters():
    if p not in params_stdp_set:
        params_gradient_descent.append(p)

optimizer_gd = Adam(params_gradient_descent, lr=0.1)
optimizer_stdp = SGD(params_stdp, lr=0.1, momentum=0.8)

stdp_learners = []
for i, layer in enumerate(net.conv_fc):
    if isinstance(layer, instances_stdp):
        stdp_learners.append(
            learning.STDPLearner(step_mode=args.step_mode, synapse=layer,
                                 sn=net.conv_fc[i + 1],
                                 tau_pre=2.,
                                 tau_post=2.,
                                 f_pre=f_weight, f_post=f_weight)
        )

net.to(args.device)
for epoch in range(start_epoch, args.epochs):
    start_time = time.time()
    net.train()
    for i in range(stdp_learners.__len__()):
        stdp_learners[i].enable()
    train_loss = 0
    train_acc = 0
    train_samples = 0
    for img, label in train_loader:
        optimizer_gd.zero_grad()
        optimizer_stdp.zero_grad()

        img = img.to(args.device)
        label = label.to(args.device)
        label_onehot = F.one_hot(label, 10).float()

        out_fr = net(img)
        loss = F.mse_loss(out_fr, label_onehot)
        loss1 = loss.detach_().requires_grad_(True)
        loss1.backward()
        # stdp
        optimizer_stdp.zero_grad()
        for i in range(stdp_learners.__len__()):
            stdp_learners[i].step(on_grad=True)
        optimizer_gd.step()
        optimizer_stdp.step()
        for i in range(stdp_learners.__len__()):  # clean the record
            stdp_learners[i].reset()

        train_samples += label.numel()
        train_loss += loss1.item() * label.numel()
        train_acc += (out_fr.argmax(1) == label).float().sum().item()
        torch.cuda.empty_cache()
        functional.reset_net(net)

    train_loss /= train_samples
    train_acc /= train_samples
    print('epoch : ',epoch,'  ;  ','train_loss : ',train_loss,'  ;  ','train_acc : ',train_acc)

Wang-jun-yu avatar Jun 07 '23 09:06 Wang-jun-yu

建议把STDP学习率设置成0,先看看只用GD时网络是否收敛,检查一下网络是否正确训练

fangwei123456 avatar Jun 07 '23 09:06 fangwei123456

将 optimizer_stdp = SGD(params_stdp, lr=0., momentum=0.)中lr设为0 输出结果为: epoch : 0 ; train_loss : 0.10000000149011612 ; train_acc : 0.10852148579752367 epoch : 1 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 2 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 3 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 4 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 5 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 6 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469

修改为使用optimizer 进行训练,就能够正常分类,修改后的代码如下: optimizer = torch.optim.SGD(net.parameters(), lr=args.lr, momentum=args.momentum) lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, args.epochs) for epoch in range(start_epoch, args.epochs): start_time = time.time() net.train() for i in range(stdp_learners.len()): stdp_learners[i].enable() train_loss = 0 train_acc = 0 train_samples = 0 for img, label in train_loader: optimizer.zero_grad() img = img.to(args.device) label = label.to(args.device) label_onehot = F.one_hot(label, 10).float() out_fr = net(img) loss = F.mse_loss(out_fr, label_onehot) loss.backward() optimizer.step() train_samples += label.numel() train_loss += loss.item() * label.numel() train_acc += (out_fr.argmax(1) == label).float().sum().item() torch.cuda.empty_cache() functional.reset_net(net)

    train_loss /= train_samples
    train_acc /= train_samples
    print('epoch : ',epoch,'  ;  ','train_loss : ',train_loss,'  ;  ','train_acc : ',train_acc)

输出如下: epoch : 0 ; train_loss : 0.12450837841269663 ; train_acc : 0.13000728332119446 epoch : 1 ; train_loss : 0.13798252292362687 ; train_acc : 0.1540422432629279 epoch : 2 ; train_loss : 0.12840495524196754 ; train_acc : 0.1540422432629279

本实验用的 CSNN网络在我的数据集上最高可以达到95%的准确率,网络应该是没有问题的,但不知道在利用STDP进行训练时出了什么问题,导致无法分类

Wang-jun-yu avatar Jun 07 '23 10:06 Wang-jun-yu

STDP作为无监督的学习器,是不保证使用后性能能增加的。如果纯GD训练没有问题,那就得慢慢调试STDP的参数了

fangwei123456 avatar Jun 07 '23 11:06 fangwei123456

将 optimizer_stdp = SGD(params_stdp, lr=0., momentum=0.)中lr设为0 输出结果为: epoch : 0 ; train_loss : 0.10000000149011612 ; train_acc : 0.10852148579752367 epoch : 1 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 2 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 3 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 4 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 5 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469 epoch : 6 ; train_loss : 0.10000000149011612 ; train_acc : 0.1088856518572469

修改为使用optimizer 进行训练,就能够正常分类,修改后的代码如下: optimizer = torch.optim.SGD(net.parameters(), lr=args.lr, momentum=args.momentum) lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, args.epochs) for epoch in range(start_epoch, args.epochs): start_time = time.time() net.train() for i in range(stdp_learners.len()): stdp_learners[i].enable() train_loss = 0 train_acc = 0 train_samples = 0 for img, label in train_loader: optimizer.zero_grad() img = img.to(args.device) label = label.to(args.device) label_onehot = F.one_hot(label, 10).float() out_fr = net(img) loss = F.mse_loss(out_fr, label_onehot) loss.backward() optimizer.step() train_samples += label.numel() train_loss += loss.item() * label.numel() train_acc += (out_fr.argmax(1) == label).float().sum().item() torch.cuda.empty_cache() functional.reset_net(net)

    train_loss /= train_samples
    train_acc /= train_samples
    print('epoch : ',epoch,'  ;  ','train_loss : ',train_loss,'  ;  ','train_acc : ',train_acc)

输出如下: epoch : 0 ; train_loss : 0.12450837841269663 ; train_acc : 0.13000728332119446 epoch : 1 ; train_loss : 0.13798252292362687 ; train_acc : 0.1540422432629279 epoch : 2 ; train_loss : 0.12840495524196754 ; train_acc : 0.1540422432629279

本实验用的 CSNN网络在我的数据集上最高可以达到95%的准确率,网络应该是没有问题的,但不知道在利用STDP进行训练时出了什么问题,导致无法分类

您好,我也遇到了同样的问题,请问您解决了吗?

EECSPeanuts avatar Oct 26 '23 11:10 EECSPeanuts

我想再次强调,这是STDP的feature,不是bug😂

fangwei123456 avatar Oct 26 '23 12:10 fangwei123456

STDP本来在深度SNN上就是不work的算法,即使STDP本身的实现是正确的,教程里只是展示如何使用。

Yanqi-Chen avatar Oct 27 '23 02:10 Yanqi-Chen

我刚刚也遇到了这个问题,看来只能先不用stdp了

thebug-dev avatar Apr 26 '24 10:04 thebug-dev