perceiver-pytorch icon indicating copy to clipboard operation
perceiver-pytorch copied to clipboard

network can't train when incorporate this

Open abeyang00 opened this issue 3 years ago • 5 comments

i have added perceiver to my current network and it seems like network can't be trained. AP is zero all the way and doesn't train at all.

Does the code need to be changed in order to incorporate into another network?

abeyang00 avatar Mar 25 '21 09:03 abeyang00

I saw the same problem. In fact, it doesn't work well in FP16, I'm getting NaNs really quick (generally at epoch 2). Maybe try FP32? Sometimes it doesn't converge too. Here is my code: https://github.com/clementpoiret/Perceiver_MNIST

clementpoiret avatar Apr 15 '21 08:04 clementpoiret

@clementpoiret I took a quick look at your repo: Are you trying to classify MNIST?

Having not used it myself yet, I think the user needs to specify the objective by adding a head to the Perceiver (e.g., a classifier head).

amqdn avatar Apr 15 '21 20:04 amqdn

@clementpoiret

Never mind. I see in the code now that to_logits includes a Linear layer to num_classes, and that you've also included that in your code. Huh.

amqdn avatar Apr 16 '21 01:04 amqdn

Yes you're right, I tried this quickly. But it's pretty slow to converge, and sometimes it doesn't even learn at all

clementpoiret avatar Apr 16 '21 09:04 clementpoiret

Maybe you should try warmup learning rate sceduler? Transformer is particularly sensitive to learning rate scheme.

OctoberKat avatar Jul 06 '21 07:07 OctoberKat