tianylin
tianylin
Actually, after reading the src code, seems that this problem falls to the `matplotlib`. Maybe you could consider solving this using some sort of wrapper class? 'course it's only a...
same issue for Factorized version:)
如果你是在显卡上运行,根本不存在cache输入可以省内存(显存)的说法啊,因为你缓存样本相当于增大batch size,中间的activations应该更费显存。
1、存放文本之后,你每次需要forward key encoder才可以得到这么多负样本表示,这中间计算都是占用额外显存的; 2、维护embedding是一个句子对应一个embedding,中间的maxlen是没有的。 显然,前者比后者多占内存。
> > 1、存放文本之后,你每次需要forward key encoder才可以得到这么多负样本表示,这中间计算都是占用额外显存的; 2、维护embedding是一个句子对应一个embedding,中间的maxlen是没有的。 显然,前者比后者多占内存。 > > 但是抛开急剧增加的计算量,他这样写是不是能直接解决负样本一致性的问题,根本不需要什么动量编码器了 他这样相当于直接开了更大的batchsize训练而已...
`nn.ParameterList` is used in the code, which seems to be incompatible with `nn.DataParallel`. This will cause the replica to be empty. I think this is the problem.
> I run into the same error. I wonder is this solved? Thanks. You can downgrade your torch to 1.4.0, which works fine for me (hint: you might have to...
I dont think directly using MSE loss would be an ideal technique. This is problematic when the output of teacher model mismatch the given label. For classification tasks, the logits...