Jian
Jian
derive pt with respect to x,i think you loss negative symbol, the right thing is "- y * pt * (1 - pt)"
thanks for your codes, would you mind sharing your training log of top1 accuracy, i find it takes too long to complete one training procedure.
As paper said,group conv of different group channel can extract different part features of an object.I wonder the improvement of performance come from group conv or attention mudule. Do you...
there is visda-2017 in DA/data/, but not training script in README file, does this code support training in the visda database?
after adding a linear layer of (128,50000)with the same hyperparameters,top1 performance is only 30.
it is a really nice work for semi-surpervised leaning. would you mind provide the performance of your code compared with paper performance?
i train the model nealy 24 hours,but only get one tenth of the train stage ` t = 108800 / 1000000`,my device is one TITAN Xp.
Thanks for your detailed code. If i understand right, as paper 3.2.4 said, > the combination is an _element wise multiplication_ of ..... but in core.model, the implement is [torch.matmul](https://github.com/Clarifai/few-shot-ctm/blob/a3e79f8ef0674417709b9cc7051862ea99be80aa/core/model.py#L617)...
all sum func in eq 2 should be product and the Cij is not the same as Eq 1 please correct me if there is something wrong. thanks
is there any plan to release the mix&match paper code