ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

combined.loss is not equal to loss of output feature?

Open Jeffwan opened this issue 2 years ago • 1 comments

Describe the bug

I notice even I choose one output feature, the combined.loss != output_feature.loss. I thought if one output feature is given, this should be exact same? Do I understand it correctly? (BTW, ludwig train works as expected, seem this is just happen in HPO)

To Reproduce

wget https://ludwig-ai.github.io/ludwig-docs/0.5/data/rotten_tomatoes.csv
ludwig init_config --dataset /data/rotten_tomatoes.csv --target=recommended --hyperopt=true --time_limit_s=300 --output /data/rotten_tomatoes.yaml
ludwig hyperopt --config rotten_tomatoes.yaml --dataset /data/rotten_tomatoes.csv

then check the test_statistics.json under best trail folder.

Expected behavior These two values should be same.

Screenshots image

Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Python version
  • Ludwig version -> nightly

Additional context Add any other context about the problem here.

Jeffwan avatar Aug 08 '22 18:08 Jeffwan

Hey @Jeffwan you make a great point and probably we shoudl document this better in the docs (@justinxzhao @dantreiman FYI). The combined loss is roughly defined as output_feature_weight * output_feature_loss + regularization_lambda * regularization + additional_losses. So if you have turned on regularization in your config, that would likely be an explanation of why the loss is different. Another possible explanation is the additional_losses, some models, like for instance TabNet, have their own losses (sparsity inducing loss in TabNet for instance), and those losses are added to the combined loss and not each individual feature loss because if the model has two output features and the additional loss is coming from, say, the combiner, you don't want to count it twice. Does this make sense? Was either of those two scenarios the case for your model? If it happens only in HPO, maybe one of the parameters your are hyperopting on is a regularization parameter?

w4nderlust avatar Aug 09 '22 00:08 w4nderlust

I'll create a ticket to add this detail to our public modeling docs.

justinxzhao avatar Aug 12 '22 15:08 justinxzhao