fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

Some misalignment of data2vec v2 between code and paper

Open HuangChiEn opened this issue 2 years ago • 1 comments

❓ Questions and Help

Before asking:

This issue should be mentioned in data2vec v2 paper explicitly, instead of roughly explane in few phase. So, there have no sufficient info in document (paper) .

What is your question?

Why the inverse mask trick can "enable the student model to build semantically rich representations over local regions of the sample". Since the masking ratio (MR) and preserving ration (PR) is fixed!! (1-MR = PR) No matter what you implement it should be the same, isn't it ? then why inverse mask trick works ?

Code

Besides, only the vision config have inverse mask option, the other modality potentially support this (i guess). For example, the text modality just directly keep the preserved part. So, we can have a quick review :

# mask_length=3, a block contains 9 mask patchs (mask_length x mask_length)
	def compute_block_mask_2d(shape=(B, L), mask_prob=0.8, mask_length=3, mask_prob_adjust=0.07, inverse_mask=True):
	    B, L = shape
	    d = int(L**0.5)
	    if inverse_mask:
                # what is the point if i set mask_prob = 0.2 without enable inverse mask ? 
	        mask_prob = 1 - mask_prob
			if overlapping: # default is overlapping mask
					mask = torch.zeros((B, d, d))
		      mask_inds = torch.randint(
		          0,
		          L,
		          size=(  # paper formula = L * ((1-R)+A) / B, note notation is different
		              B,
		              int(
		                  L
		                  * ((mask_prob + mask_prob_adjust) / mask_length**2)
		                  * (1 + mask_dropout)
		              ),
		          ),
		      )
					# scatter the starting point
		      mask.view(B, -1).scatter_(1, mask_inds, 1)
		      centers = mask.nonzero(as_tuple=True)
		
		      inds = ([], [], [])
					
					# chess-board 9 neightboard fill with 1
		      offset = mask_length // 2
		      for i in range(mask_length):
		          for j in range(mask_length):
		              k1 = i - offset
		              k2 = j - offset
									# batch dims
		              inds[0].append(centers[0])
									# x-axis cord's'
		              inds[1].append(centers[1] + k1)
									# y-axis cord's'
		              inds[2].append(centers[2] + k2)
		
		      i0 = torch.cat(inds[0])
		      i1 = torch.cat(inds[1]).clamp_(min=0, max=d - 1)
		      i2 = torch.cat(inds[2]).clamp_(min=0, max=d - 1)
					# masking..
		      mask[(i0, i1, i2)] = 1

What have you tried?

read the code and paper..

What's your environment?

not important..

HuangChiEn avatar Mar 23 '23 02:03 HuangChiEn

the same question with Audio. While inverse_mask is an important role in paper, the "model.modalities.audio.inverse_mask" in "example/data2vec/config/v2/base&large_audio_only_task.yaml" is false in default official code.

lazerliu avatar Dec 16 '23 07:12 lazerliu