fairseq
fairseq copied to clipboard
Some misalignment of data2vec v2 between code and paper
❓ Questions and Help
Before asking:
This issue should be mentioned in data2vec v2 paper explicitly, instead of roughly explane in few phase. So, there have no sufficient info in document (paper) .
What is your question?
Why the inverse mask trick can "enable the student model to build semantically rich representations over local regions of the sample". Since the masking ratio (MR) and preserving ration (PR) is fixed!! (1-MR = PR) No matter what you implement it should be the same, isn't it ? then why inverse mask trick works ?
Code
Besides, only the vision config have inverse mask option, the other modality potentially support this (i guess). For example, the text modality just directly keep the preserved part. So, we can have a quick review :
# mask_length=3, a block contains 9 mask patchs (mask_length x mask_length)
def compute_block_mask_2d(shape=(B, L), mask_prob=0.8, mask_length=3, mask_prob_adjust=0.07, inverse_mask=True):
B, L = shape
d = int(L**0.5)
if inverse_mask:
# what is the point if i set mask_prob = 0.2 without enable inverse mask ?
mask_prob = 1 - mask_prob
if overlapping: # default is overlapping mask
mask = torch.zeros((B, d, d))
mask_inds = torch.randint(
0,
L,
size=( # paper formula = L * ((1-R)+A) / B, note notation is different
B,
int(
L
* ((mask_prob + mask_prob_adjust) / mask_length**2)
* (1 + mask_dropout)
),
),
)
# scatter the starting point
mask.view(B, -1).scatter_(1, mask_inds, 1)
centers = mask.nonzero(as_tuple=True)
inds = ([], [], [])
# chess-board 9 neightboard fill with 1
offset = mask_length // 2
for i in range(mask_length):
for j in range(mask_length):
k1 = i - offset
k2 = j - offset
# batch dims
inds[0].append(centers[0])
# x-axis cord's'
inds[1].append(centers[1] + k1)
# y-axis cord's'
inds[2].append(centers[2] + k2)
i0 = torch.cat(inds[0])
i1 = torch.cat(inds[1]).clamp_(min=0, max=d - 1)
i2 = torch.cat(inds[2]).clamp_(min=0, max=d - 1)
# masking..
mask[(i0, i1, i2)] = 1
What have you tried?
read the code and paper..
What's your environment?
not important..
the same question with Audio. While inverse_mask is an important role in paper, the "model.modalities.audio.inverse_mask" in "example/data2vec/config/v2/base&large_audio_only_task.yaml" is false in default official code.