contrastive-predictive-coding
contrastive-predictive-coding copied to clipboard
Confusion about the batch size and negative pairs
As discussed in the original paper, the training relies on a large number of negative pairs to tighten the lower bound of the Mutual Information, which corresponds to log(N). However, in this code, the negative pairs are constructed only in a mini-batch. Why are these negative pairs enough?