ProgLearn
ProgLearn copied to clipboard
Supervised Contrastive Loss
Reference issue
Type of change
Implementing supervised contrastive loss Adding plotting script to compare accuracies and transfer efficiencies
What does this implement/fix?
Implementing contrastive loss explicitly learns the progressive learning network transformer by penalizing samples of different classes that are close to one another. The new script enables two dnn algorithms to be compared by plotting the difference between their accuracies and transfer efficiencies. The accuracy of the supervised contrastive loss version improves by 6 percent compared to the PL network with categorical cross entropy.
Additional information
NDD 2021
Codecov Report
Merging #518 (43b05f7) into staging (634d4d1) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## staging #518 +/- ##
========================================
Coverage 90.09% 90.09%
========================================
Files 7 7
Lines 404 404
========================================
Hits 364 364
Misses 40 40
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 634d4d1...43b05f7. Read the comment docs.
@rflperry Does this PR help your query about contrastive loss?
Yeah seems like it matches my results here that the transfer ability goes down, which I find interesting but the reason why I'm still a bit intrigued by. Not really worth adding if just always worse? I forget why I had multiple different results with different labels.
My takeaways/summary:
- Since the decider is k-Nearest Neighbors, we want the learned (penultimate) representation to place samples of the same class close together.
- Contrastive loss learns representations that are close together, and this is validated as we see higher accuracy from our kNN classifier. Softmax worked, but wasn't explicitly tuned to learn what we wanted (see embedding results for various losses here). In a way, the best loss would be a function of the network and decider together.
- One slightly odd thing is that the difference in accuracy is non-monotonic (i.e. goes down then up). Maybe just a result of not running enough simulations?
- Despite the accuracy going up, the transfer efficiencies are slightly worse. I'm a bit fuzzy on the details of the transfer efficiency metric, but potentially the learned embeddings are not good for OOD performance (this has been observed in various learned embedding algorithms like tSNE I believe)
@PSSF23 fixed!
@PSSF23 Perfect, just made those changes. Thank you!
@PSSF23 Sorry I missed that, it should be good now.