Kassa

Results 22 issues of Kassa

Why I get the different results between eval mode and test mode? ![image](https://user-images.githubusercontent.com/30862458/168093289-78c45139-a53c-43f2-939b-56e67a66c400.png)

help wanted

Where is the code if 4 pre-training tasks? I do not find them.

The formula of information bottleneck is ![image](https://user-images.githubusercontent.com/30862458/171373111-e852350a-ef63-455e-b181-4d805cda47e8.png) But I do not find it in this paper. The loss function of MIB seems be derived from some definition proposed by authors....

Your work is really interesting!! But all the dataset is really huge and if I just want to learn the Bi-Treelstm model to encode the context part, what can I...

In the section 4.1, you mention an example about trees, and how can I process the FGW on trees? May you release the relevant code about 4.1? It will be...

From the core formula, SimCTG just replace the positive sample with current token itself and negative sample with previous token.

why the Rotary position encodings (RoPE) was applied to 64 dimensions of each head rather full dimensions.