YovaKem comments

Results 13 comments of


                                            YovaKem

ValueError: setting an array element with a sequence.

I reckon you are no longer concerned with this issue, but for future reference of other users, I'll share what turned out to be the issue in my case: I...

Electra-small embedding size

From experience, I can say they're not. I tried training a smaller ELECTRA model setting the embedding size (the only hyperparameter concerning size that could be set in the bash...

Looking at the details provided [here](https://aclanthology.org/attachments/N18-1162.Notes.pdf) my guess is that the correct commands would be ``` python train.py --data=cornell --model=VHCR --batch_size=80 --sentence_drop=0.25 --kl_annealing_iter =15000 python eval.py --data=cornell --model=VHCR --checkpoint= ```...

RWKV4neo

> I have a conversion script + draft that has consistent logit ordering with the official implementation here: > > conversion script: https://github.com/tensorpro/transformers/blob/main/src/transformers/models/rwkv4_neo/convert_rwkv_original_pytorch_checkpoint_to_pytorch.py model in torch: https://github.com/tensorpro/transformers/blob/main/src/transformers/models/rwkv4_neo/modeling_rwkv4_neo.py > > I...

Add RWKV-4

hi @sgugger, thanks A TON for this merge! I am trying to train a new model of type and facing the following error: ``` Traceback (most recent call last): File...

Add RWKV-4

I managed to get the code to run with some changes to the forward() and backward() functions: ```python class RwkvLinearAttention(torch.autograd.Function): @staticmethod def forward(ctx, time_decay, time_first, key, value, state=None, return_state=False): batch_size,...

Add RWKV-4

Thanks @Blealtan! I guess you meant `k` for `key`? I added bf16 support for `g_time_first` (I get an error otherwise) and put the tensors on CUDA ```python # The CUDA...

Faulty import

Or rather it should be changed to reflect the class call [here](https://github.com/ufal/augpt/blob/fa8a57961ed1d8fe6099978c489c0b0f8956d64e/train_multiwoz.py#L23).

Results with public checkpoints

Ok, I see. But the code is now up to date and I can assume I'm getting the right results?

Results with public checkpoints

Thanks. And can I ask for a clarification on why responses are [not lexicalized](https://github.com/ufal/augpt/blob/fa8a57961ed1d8fe6099978c489c0b0f8956d64e/generate.py#L168) in `generate.py`? Does that also have to do with keeping the evaluation procedure consistent across works?