models icon indicating copy to clipboard operation
models copied to clipboard

[TASK] Implement PredictMasked (BERT-like masking)

Open marcromeyn opened this issue 3 years ago • 0 comments

BERT-like masking

Let’s say we have a sequence ABCDE, BERT-like masking would result in the following:

Inputs A MASKED C MASKED E
Targets MASKED B MASKED D MASKED

Note, the number of target items might differ for different samples in the batch. We should ensure that we have at least one target and at most len(seq) - 1

The class that’s responsible for masking could roughly look like:

class PredictMasked(DataAugmentation):
	def __init__(
		self, 
		schema: Schema
		target: Union[str, Tag, ColSchema],
		prediction_block=None,
		mask_selection_rate,
		mask_selection_length,
		unselectable_token_ids=[0],
		mask_token_rate=0.8,
		random_token_rate=0.1
	):
		...

	def compute_mask(self, ...):
		...

We want to make use of standard Keras functionality w.r.t. masking. Some useful links:

marcromeyn avatar Aug 30 '22 08:08 marcromeyn