Training Code
Hi there,
Great work. I wanted to enquire if we can train the model on custom dataset and if the training code be made available.
Regards,
hey there - thanks! i’m not ready to release the training code yet but can potentially run the training job for you, happy to chat if you’re interested in that.
Just wondering, what if pretrain model on the entire danbooru or gelbooru dataset? There is a lot of useful information there in the form of tags
@Muinez that's a great idea, it has a very rich collection of tags
@vikhyat what was the training system and cost? Is this expensive to train? Seems like a cool project to try myself, but I'm worried about cost 💸
Each training run currently takes 20 hours on 4x4090, but it was 4x that before I wrote a bunch of custom CUDA kernels to speed up training. And it took 172 experiments to get to this point, so I'm about $20K in the hole on this project. 😬
@vikhyat I hear ya! I have 4x 4090s too, but split over 2 machines, maybe doable? I'm not a CUDA guy though, so I would just train 4x longer 🤣 I would be interested it repeating your work, but with TinyLlama, as a side project.
@vikhyat I hear ya! I have 4x 4090s too, but split over 2 machines, maybe doable? I'm not a CUDA guy though, so I would just train 4x longer 🤣
I would be interested it repeating your work, but with TinyLlama, as a side project.
I would love to help on this project, are you looking for people ?
I can donate my 1080 Ti for this if needed. Willing to help in any way. seems like a great project!
why not run on the cloud? have 1x3090 but realised my investment was more about convenience than cost efficency
Hi all, I can report that after a few optimizations (float16 usage instead of float32 is the main one, but few minor others) I was able to run this model really fast on a GTX 1080 Ti with just about 4.6gb VRAM usage. This model truly runs anywhere!
@vikhyat I'd like to help with this project development. When do you think you'll be ready to publish the training code?