temperature_scaling icon indicating copy to clipboard operation
temperature_scaling copied to clipboard

Temperature scaling on segmentation tasks

Open Karol-G opened this issue 3 years ago • 1 comments

Hi,

I wanted to apply temperature scaling to a segmentation task with 1 class (So a pixel belongs to this class or not). Instead of CrossEntropyLoss I am using BCEWithLogitsLoss and I had to disable _ECELoss due to some bugs I could not fix. However, the actual evaluation score performance is much worse after temperature scaling then before. Furthermore, I noticed that the temperature during optimization becomes very high. What are your thoughts on this? It does not seems correct that the temperature becomes this high? Is temperature scaling simply not suited for segmentation? Did you do tests on some segmentation tasks?

-----
Skipped about 10 steps here
-----
self.temperature:  Parameter containing:
tensor([3.4501], requires_grad=True)
self.temperature:  Parameter containing:
tensor([7.5331], requires_grad=True)
self.temperature:  Parameter containing:
tensor([15.8106], requires_grad=True)
self.temperature:  Parameter containing:
tensor([49.7303], requires_grad=True)
self.temperature:  Parameter containing:
tensor([975.0599], requires_grad=True)
self.temperature:  Parameter containing:
tensor([8138381.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([16275787.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([24413192.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([32550598.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([40688004.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([40688004.], requires_grad=True)
Optimal temperature: 40688004.000
After temperature - NLL: 0.693

Best Karol

Karol-G avatar Mar 23 '21 13:03 Karol-G

We use if for segmentation, with the CrossEntropyLoss. However, I have problems with the L-BFGS myself – I believe it is because the loss is locally linear around the starting temperature; the loss is evaluated many times, although only one optimizer step is performed, and it fails to go far from the starting point.

hmeine avatar Mar 30 '21 11:03 hmeine