jepa icon indicating copy to clipboard operation
jepa copied to clipboard

is it possible to learn an action model (or the action's effects) with v-jepa ?

Open aymeric75 opened this issue 7 months ago • 3 comments

Hello,

I would like to know if it is possible to add the knowledge of the actions performed by an agent into the architecture.

From my understanding the unmasked part of the image and the coordinates of the masked parts are given as input to the predictor (which predicts the masked parts). So, as I understand, the prediction predicts static elements (parts of the same image) and not next states.

Would it be possible, instead, to make Jepa to predict next images, given a present image and an action ? Or, can the actual implementation be used to produce representations that would fit in this downstream task (i.e. obtaining the "effects" of an action onto an image) ?

Thanks a lot

aymeric75 avatar Jun 26 '24 06:06 aymeric75