sunxin010205
Results
1
issues of
sunxin010205
Hi! After replacing an eight-layer Transformer encoder with Mamba, the training loss fails to decrease. Could it be that Mamba doesn't perform as effectively as the Transformer in the diffusion...