Kirill Mavreshko
Kirill Mavreshko
Yes, I do have plans to implement Transformer-XL, but cannot promise how soon.
Hi! Thanks for the report. I'll look into it.
Hi! Yes, that is a reasonable feature. However, it currently has a low priority, since I currently don't have much time and a similar result can be achieved by introducing...
Hi! Could you please post an example that utilizes these changes? Perhaps a function that builds a model, similar to [`vanilla_transformer_gpt_model`](https://github.com/kpot/keras-transformer/blob/master/example/models.py#L74).
Hi! I was thinking recently about a similar thing. For a while now Django has a feature: whenever you change your model, you can simply run its CLI [`django-admin makemigrations`](https://docs.djangoproject.com/en/4.1/ref/django-admin/#django-admin-makemigrations)...
@mohs8421 > For example I can have a table in a first migration. But in the next migration, I want to add a reference to an other new table. Which...
@yyxiongzju Good point, I completely missed the fact (and the comment!) that `iterative_inv` works exclusively with post-softmax matrices. In that case your initialization is perfectly correct. Thanks for the explanation!...
@yyxiongzju On top of previous question. In the paper you say "For all our experiments, we need to run about 6 iterations in order to achieve a good approximation of...
@yyxiongzju Thanks for the details! I've thrown together a notebook containing an [alternative version of `iterative_inv`](https://github.com/kpot/Nystromformer/blob/new_inv/notebooks/iterative_inv_convergence_test.ipynb) along with some of my experiments. Could you please take a look? So far...
@yyxiongzju Thank you for your time, you did a good job explaining all this! I totally agree with you on that everything should be practical in such things. If a...