YovaKem issues

Results 12 issues of


                                            YovaKem

Turkish ELECTRA small genetator

Is the generator counterpart to this model: dbmdz/electra-small-turkish-cased-discriminator available? Thanks!

Can you share the full NoCaps results on the test data?

The OSCAR paper reports a CIDEr score of 78.8 and 80.9 for OSCAR base and large, resp. Since it's not clarified, I assume these are scores on the NoCaps test...

Electra-small performance

I trained an ELECTRA-small model following exactly the instructions in the README and obtained a model with downstream performance substantially worse than the google/electra-small-discriminator on HuggingFace. Apart from the training...

Electra-small embedding size

The README says ELECTRA-small has 256 hidden units and ELECTRA-base has 768. [Here](https://github.com/google-research/electra/blob/master/configure_pretraining.py) the embedding size for a small model is being set to 128 and for a base model...

Can you share models trained with all weights tied?

In the paper you say " On the other hand, tying all encoder weights caused little improvement while incurring the significant disadvantage of requiring the generator and discriminator to be...

Can you share a pre-trained checkpoint?

Why is max depth fixed?

"In contrast to the original paper, the generated trees are always fitted with the same maximum depth. In the original implementation the maximum depth of the tree are drawn from...

Hyperparameters for prediction

Can you tell me what hyperparameters were used for the beam search at inference time and anything concerning penalty for length and repetition? Thanks!

Results with public checkpoints

Running the evaluation in the following fashion: ``` ./generate.py --model jkulhanek/augpt-mw-21 --dataset multiwoz-2.1-test --file predictions.txt ./evaluate_multiwoz.py --file predictions.txt --dataset multiwoz-2.1-test ``` I get inform and success results somewhat lower than...

Hardcoded MultiWOZ version

Perhaps the data version should not be hardcoded [here](https://github.com/ufal/augpt/blob/fa8a57961ed1d8fe6099978c489c0b0f8956d64e/train_multiwoz.py#L43) in case one is training on MultiWOZ 2.0.