sunxin010205

Results 1 issues of sunxin010205

Hi! After replacing an eight-layer Transformer encoder with Mamba, the training loss fails to decrease. Could it be that Mamba doesn't perform as effectively as the Transformer in the diffusion...