Chen Liu issues

Results 7 issues of


                                            Chen Liu

大佬请问这边当我先更新critic时再更新actor（论文是这样的）这个会报因为inplace操作导致梯度的更新失败。。真的改不出来了

self.critic_optim.zero_grad() critic_loss.backward() self.critic_optim.step() self.actor_optim.zero_grad() actor_loss.backward() self.actor_optim.step() 当我把这个顺序调整后，这个会报错：因为inplace操作导致梯度的更新失败。。感激了

may I ask some questions about trainning?

When we train the current task, will we use the data of the previous task？ ewc need task A data to compute fisher info, when we train task B how...

Why are the logits on the numerator in the loss function not masked for comparing a sample with itself?

dalao, I find that in PaCo or GPaCo the logits on the numerator in the loss function not masked, but the denominator of the loss function is masked cause `exp_logits...

clients' gradients how to update ?

In Configuration_1.ipynb's train function the server receives the output from clients and then computes the loss and backward(), but how are the gradients updated on the clients? 😥 thanks.

EWC的训练

大佬请问一下您这边使用ewc进行训练的时候第一个任务是没有使用ewc的么，就是训练一个基础模型。然后后面的任务才使用ewc进行约束了？还是说先进行训练第一个任务的基础模型然后这个模型再用ewc训练了一次。。（可能是我代码没理解好）

阅读您的代码时发现bimamba有v1和v2版本分别对应BimambaInnerFn和MambaInnerFnNoOutProj；v1对同一个序列分别进行正向forward和反向forward但是采用了相同的proj模块计算B和C，而v2则是定义了两次proj模块（x_proj于x_proj_b）分别计算不同的B,C，请问为什么要这样做呢？以及这两者哪个效果会更好一点？最后mamba中step()这个函数是不是不管训练还是推理时都不会执行，因为step中并没有指明两次计算的过程？非常感谢。😦

The fine-tuning training process

Hello, it's a great work. The paper mentions several mask schemes in training processes could you open-sourced training process? Thanks.😀