transformers icon indicating copy to clipboard operation
transformers copied to clipboard

RetNet model support

Open yoinked-h opened this issue 1 year ago • 18 comments

Model description

RetNet / Retentive Networks is a new model archetype released by microsoft; the research paper is here. As of now, there is one model for retnet; made by me; which is undertrained (loss=8!) and I am trying to make a second model on a larger arch.

Open source status

  • [X] The model implementation is available
  • [X] The model weights are available

Provide useful links for the implementation

commit that has retnet training @donglixp was the main author for commit and cited on the paper all code is licensed under MIT, including model weights

yoinked-h avatar Aug 01 '23 17:08 yoinked-h

cc @ArthurZucker @younesbelkada

amyeroberts avatar Aug 01 '23 18:08 amyeroberts

p.s. if google offered any bigger TPU's for TRC; i could train retnet-3b (the point at which retnet is better than regular transformers), but as of now; theres retnet_base (small) and retnet_medium (ill upload it when it gets good)

yoinked-h avatar Aug 01 '23 18:08 yoinked-h

I am wondering if the original authors released the trained models?

ydshieh avatar Aug 02 '23 07:08 ydshieh

as far as i know, no official pretrained models were released by microsoft; but the training code is on the torchscale repo, so thats how i am training the models

yoinked-h avatar Aug 02 '23 09:08 yoinked-h

Cool model! But as long as we don't have official/ very good pretraining checkpoints, not really anything we can do!

ArthurZucker avatar Aug 02 '23 09:08 ArthurZucker

ah, understood, i'll try to get a good checkpoint; but for now, i assume i can close this and reopen when it finishes training

yoinked-h avatar Aug 02 '23 09:08 yoinked-h

oops

yoinked-h avatar Aug 02 '23 10:08 yoinked-h

https://huggingface.co/parsee-mizuhashi/retnet/tree/main trained it on 1m steps, loss is around 4.2, hope this is good enough for some inference code

yoinked-h avatar Aug 05 '23 16:08 yoinked-h

My recommendation would be to put the model on the hub following this tutorial, which will help having a working code without going trough the hassle of all the review process! Then if the models is highly requested/has as lot of usage or has official released checkpoints then we'll add it in transformers! Does that make sens for you @yoinked-h ? 🤗

ArthurZucker avatar Aug 07 '23 07:08 ArthurZucker

If you implement it or link some useful code for training we could provide some computing power

flozi00 avatar Aug 07 '23 08:08 flozi00

My recommendation would be to put the model on the hub following this tutorial, which will help having a working code without going trough the hassle of all the review process! Then if the models is highly requested/has as lot of usage or has official released checkpoints then we'll add it in transformers! Does that make sens for you @yoinked-h ? 🤗

yeah, i'll try to make the custom model scripts and push them to the hub

If you implement it or link some useful code for training we could provide some computing power

the training code is kind of buggy (doesnt work with TPU accelerate) but here, i also have a shell script which does most of the work for setup->training

yoinked-h avatar Aug 07 '23 09:08 yoinked-h

I started an training of small (around 300m params) model with german data. Its HF compatible and should push the code to the hub too.

flozi00 avatar Aug 07 '23 11:08 flozi00

300m and 1300m models are training After finding a bug in learning rate scheduling the loss is decreasing again. The text is grammatical okay but doesn't make sense right now. Looking forward to the new run 😁 Will push the weights and code to the hub on Friday I think.

flozi00 avatar Aug 08 '23 08:08 flozi00

https://huggingface.co/flozi00/RetNet-300m-German

Maybe I find some time to train larger models, for example 7b, when i am not ill anymore

flozi00 avatar Aug 09 '23 16:08 flozi00

https://huggingface.co/papers/2307.08621#64bff688661694889faecdb2

Will be waiting for the release from Microsoft

flozi00 avatar Aug 10 '23 21:08 flozi00

Hello everyone, Is there any better pre-trained model available now?

zzczzc20 avatar Sep 04 '23 13:09 zzczzc20

hey @yoinked-h , can you further assist me about how you manage to train a retnet model? I cant seem to manage it ? If possible can you share a python file or notebook ? Thank you so much in advance

risedangel avatar Sep 22 '23 08:09 risedangel

I publish a RetNet model for study, you can try it : https://huggingface.co/wac81/toy_retnet_1.3b_pretrain

Hello everyone, Is there any better pre-trained model available now?

wac81 avatar May 11 '24 14:05 wac81