Use of Conditional (possibly latent) variables (z) in predictor from original JEPA paper.

Open Sharpz7 opened this issue 1 year ago • 1 comments

Hey Folks,

I have been looking through V-JEPA and its predecessors, and I am trying to find if V-JEPA is making use of the conditional variables in the predictor, as I am struggling to tell myself from the code (I am relatively new to ML). There is limited mentions of it in the I-JEPA and V-JEPA papers, so I was wondering if it is something for future research.

Thanks, Adam

Mar 22 '24 07:03 Sharpz7

Hi @Sharpz7. It is not obvious in the paper, but JEPA does use a conditional variable in the predictor. Given some encoding, ask yourself: how to I know which targets to predict? The answer is the positional encoding applied to the mask tokens before they are passed through the predictor.

Lines 213 - 217 in jepa/src/models/predictor.py

# These are all of our positional encodings
pos_embs = self.predictor_pos_embed.repeat(B, 1, 1)
# Select the encodings corresponding to the targets we want to predict
pos_embs = apply_masks(pos_embs, masks_tgt)
# Repeat over batch dimension
pos_embs = repeat_interleave_batch(pos_embs, B, repeat=len(masks_ctxt))
# Add to mask tokens before they get passed through predictor
pred_tokens += pos_embs

Mar 27 '24 19:03 ccaven