Losses quickly converge to zero
Have you met the problem that the loss quickly converges to zero in two epochs even with very large swap noise (>0.5) or dropout? Meanwhile, the transformed features do not contain useful informations. I am not sure if this is the problem caused by the dataset or not...
Hi, can you describe a bit more about the characteristics of the dataset? I found a bug in the code that handles categorical data embedding. If your dataset contains mostly categoricals and using embeddings, it might be the issue.
It is weird. My dataset has only continuous features. Even when I use hidden unit = 1, the loss can go down to almost zero...Looks like there is some leakage either in the model or in my data.
Ok, thanks for your details. I guess you can try holdout a column and use all the rest to predict it with some simple models and see if you can predict holdouts really well.