Vivi

Results 2 issues of Vivi

In some cases, such as the sequence length is short or the value of mask_prob is small, there will be a situation where the whole training sequence is not masked,...

``` tl = seqs.shape[1] # time dim len for enforce causality attention_mask = ~torch.tril(torch.ones((tl, tl), dtype=torch.bool, device=self.dev)) ``` I can't understand why the attention_mask is this shape. Can you give...