ProgLearn icon indicating copy to clipboard operation
ProgLearn copied to clipboard

Supervised Contrastive Loss

Open waleeattia opened this issue 3 years ago • 7 comments

Reference issue

#426

Type of change

Implementing supervised contrastive loss Adding plotting script to compare accuracies and transfer efficiencies

What does this implement/fix?

Implementing contrastive loss explicitly learns the progressive learning network transformer by penalizing samples of different classes that are close to one another. The new script enables two dnn algorithms to be compared by plotting the difference between their accuracies and transfer efficiencies. The accuracy of the supervised contrastive loss version improves by 6 percent compared to the PL network with categorical cross entropy.

Additional information

NDD 2021

waleeattia avatar Dec 08 '21 23:12 waleeattia

Codecov Report

Merging #518 (43b05f7) into staging (634d4d1) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           staging     #518   +/-   ##
========================================
  Coverage    90.09%   90.09%           
========================================
  Files            7        7           
  Lines          404      404           
========================================
  Hits           364      364           
  Misses          40       40           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 634d4d1...43b05f7. Read the comment docs.

codecov[bot] avatar Dec 09 '21 00:12 codecov[bot]

@rflperry Does this PR help your query about contrastive loss?

jdey4 avatar Dec 10 '21 12:12 jdey4

Yeah seems like it matches my results here that the transfer ability goes down, which I find interesting but the reason why I'm still a bit intrigued by. Not really worth adding if just always worse? I forget why I had multiple different results with different labels.

rflperry avatar Dec 10 '21 13:12 rflperry

My takeaways/summary:

  • Since the decider is k-Nearest Neighbors, we want the learned (penultimate) representation to place samples of the same class close together.
  • Contrastive loss learns representations that are close together, and this is validated as we see higher accuracy from our kNN classifier. Softmax worked, but wasn't explicitly tuned to learn what we wanted (see embedding results for various losses here). In a way, the best loss would be a function of the network and decider together.
  • One slightly odd thing is that the difference in accuracy is non-monotonic (i.e. goes down then up). Maybe just a result of not running enough simulations?
  • Despite the accuracy going up, the transfer efficiencies are slightly worse. I'm a bit fuzzy on the details of the transfer efficiency metric, but potentially the learned embeddings are not good for OOD performance (this has been observed in various learned embedding algorithms like tSNE I believe)

rflperry avatar Dec 11 '21 13:12 rflperry

@PSSF23 fixed!

waleeattia avatar Dec 13 '21 17:12 waleeattia

@PSSF23 Perfect, just made those changes. Thank you!

waleeattia avatar Dec 20 '21 02:12 waleeattia

@PSSF23 Sorry I missed that, it should be good now.

waleeattia avatar Dec 20 '21 03:12 waleeattia