Pablo Pernias

Results 31 comments of Pablo Pernias

I used my own implementation

The samples I shared used temperature annealing as well, but I still don't get very good results.

I have not tried anything like BEiT, in fact, my architecture is pretty different from the one proposed in the paper. What I tried to follow as close as possible...

I have it as a parameter in my sampling function, I also tried different schedules for the annealing (linear, cosine, etc...), with very similar results, so I don't exactly remember...

It shouldn't be relevant, but my model is conditional, so I add an identity embedding to the input. My goal was to be able to control to some degree the...

Hello! Small update: I just tried adding typical filtering to the sampling code, and the results are still far from perfect, but I managed to pass from a very high...

Yes, basically before calling multinomial sampling I do what the TypicalLogitsWarper function does to set the logits of the filtered tokens to -inf so the multinomial only samples from the...

@LeeDoYup I'm pretty sure temperature is applied before softmax logits, thus affecting the multinomial sampling 🤔 but things that you mention are correlated: you change the temperature, so the probability...

In some way, it makes sense to choose them randomly since during training you're masking them randomly, so the model is used to having to reconstruct random missing tokens, not...

I don't think you're doing it right. You're supposed to first sample for every masked token, then pick the topK with highest scores, since otherwise, you don't really know the...