Creating a new dataset
Hi, I'm trying to create a new unlabeled dataset and had some questions: https://github.com/CuriousAI/tagger/blob/master/data/shapes.py#L110 As I see masks are only used to calculate AMI score and doesn't take part in training process, right? What about the "codes", what is it used for? in shapes it's empty and in Freq20-MNIST it seems related to textures.
Correct. At some point we've used codes to see how well we could classify textures in the image. It is also not used for training. So just using arrays of zeros should work fine.
Hi @Qwlouse, I am wondering how to choose the rough network size in order to not over/underfit. For example, does shapes dataset really need (2000, 1000, 500) ladder?
We haven't seen a case of overfitting yet. However for some reason the shapes network needs to be rather big. You can run with fewer units, but performance does degrade.