random uniform choosing of k in k-shot in training
In the original paper, "the number of shots K was chosen uniformly at random from 1 to 5." So the forward function of SNAIL should take an arbitrary K instead of defined at init, and training should follow the uniform random choosing as well.
Thanks!
Good catch, let me update that and update the numbers based off of that.
Good catch, let me update that and update the numbers based off of that.
So have you already updated the code of this ?
I implemented it in a fork of this repo: https://github.com/ericjang/snail-pytorch/commit/7710e393924bdab4e0d01afeb427019a875a7d16
It wasn't clear to me from reading the paper how exactly random K-shot is implemented, since the number of parameters in a TC block is dependent on K. So I implemented this by zeroing out random.randint(0, K-1)*N of each minibatch at training time.
Emailed Nikhil Mishra, he's informed me that the "random uniform choosing of k in k-shot in training" did not make the model perform better, it was just a trick so they could evaluate 1 shot and 5 shot without re-training the model. The discrepancy in performance doesn't seem to reside there.