ludwig
ludwig copied to clipboard
combined.loss is not equal to loss of output feature?
Describe the bug
I notice even I choose one output feature, the combined.loss
!= output_feature.loss
. I thought if one output feature is given, this should be exact same? Do I understand it correctly? (BTW, ludwig train works as expected, seem this is just happen in HPO)
To Reproduce
wget https://ludwig-ai.github.io/ludwig-docs/0.5/data/rotten_tomatoes.csv
ludwig init_config --dataset /data/rotten_tomatoes.csv --target=recommended --hyperopt=true --time_limit_s=300 --output /data/rotten_tomatoes.yaml
ludwig hyperopt --config rotten_tomatoes.yaml --dataset /data/rotten_tomatoes.csv
then check the test_statistics.json under best trail folder.
Expected behavior These two values should be same.
Screenshots
Environment (please complete the following information):
- OS: [e.g. iOS]
- Version [e.g. 22]
- Python version
- Ludwig version -> nightly
Additional context Add any other context about the problem here.
Hey @Jeffwan you make a great point and probably we shoudl document this better in the docs (@justinxzhao @dantreiman FYI).
The combined loss is roughly defined as output_feature_weight * output_feature_loss + regularization_lambda * regularization + additional_losses
. So if you have turned on regularization in your config, that would likely be an explanation of why the loss is different. Another possible explanation is the additional_losses
, some models, like for instance TabNet, have their own losses (sparsity inducing loss in TabNet for instance), and those losses are added to the combined loss and not each individual feature loss because if the model has two output features and the additional loss is coming from, say, the combiner, you don't want to count it twice.
Does this make sense?
Was either of those two scenarios the case for your model? If it happens only in HPO, maybe one of the parameters your are hyperopting on is a regularization parameter?
I'll create a ticket to add this detail to our public modeling docs.