tagger icon indicating copy to clipboard operation
tagger copied to clipboard

Creating a new dataset

Open scientist1642 opened this issue 8 years ago • 3 comments

Hi, I'm trying to create a new unlabeled dataset and had some questions: https://github.com/CuriousAI/tagger/blob/master/data/shapes.py#L110 As I see masks are only used to calculate AMI score and doesn't take part in training process, right? What about the "codes", what is it used for? in shapes it's empty and in Freq20-MNIST it seems related to textures.

scientist1642 avatar Feb 09 '17 09:02 scientist1642

Correct. At some point we've used codes to see how well we could classify textures in the image. It is also not used for training. So just using arrays of zeros should work fine.

Qwlouse avatar Feb 09 '17 10:02 Qwlouse

Hi @Qwlouse, I am wondering how to choose the rough network size in order to not over/underfit. For example, does shapes dataset really need (2000, 1000, 500) ladder?

scientist1642 avatar May 08 '17 23:05 scientist1642

We haven't seen a case of overfitting yet. However for some reason the shapes network needs to be rather big. You can run with fewer units, but performance does degrade.

Qwlouse avatar May 11 '17 09:05 Qwlouse