Florian Zimmermeister
Florian Zimmermeister
I started an training of small (around 300m params) model with german data. Its HF compatible and should push the code to the hub too.
300m and 1300m models are training After finding a bug in learning rate scheduling the loss is decreasing again. The text is grammatical okay but doesn't make sense right now....
https://huggingface.co/flozi00/RetNet-300m-German Maybe I find some time to train larger models, for example 7b, when i am not ill anymore
https://huggingface.co/papers/2307.08621#64bff688661694889faecdb2 Will be waiting for the release from Microsoft
Are you running on Linux ? Try to increase the swap size, most times linux just print "Killed" if running out of CPU memory
The database approach needs the users trust for the developer for sensible data. For example private images which get stored on third party storages. An made by mistake public, HF...
Yes, probably middle or end of the week
Would love to help, but have no experience with react. Came up to this framework after checking huggingface spaces. File upload would be one of the most useful features for...
Just giving my 5 cent to this thread I think enabling 8 bit training by this PR is an important factor Just adding embd layers to modules to save crashes...
I think a possible solution would be resizing the model embedding layers, saving it and then loading to 8bit as normally