moondream icon indicating copy to clipboard operation
moondream copied to clipboard

Training Code

Open neurlnetworker opened this issue 1 year ago • 11 comments

Hi there,

Great work. I wanted to enquire if we can train the model on custom dataset and if the training code be made available.

Regards,

neurlnetworker avatar Jan 24 '24 04:01 neurlnetworker

hey there - thanks! i’m not ready to release the training code yet but can potentially run the training job for you, happy to chat if you’re interested in that.

vikhyat avatar Jan 24 '24 07:01 vikhyat

Just wondering, what if pretrain model on the entire danbooru or gelbooru dataset? There is a lot of useful information there in the form of tags

Muinez avatar Jan 24 '24 13:01 Muinez

@Muinez that's a great idea, it has a very rich collection of tags

vikhyat avatar Jan 24 '24 23:01 vikhyat

@vikhyat what was the training system and cost? Is this expensive to train? Seems like a cool project to try myself, but I'm worried about cost 💸

dnhkng avatar Jan 30 '24 05:01 dnhkng

Each training run currently takes 20 hours on 4x4090, but it was 4x that before I wrote a bunch of custom CUDA kernels to speed up training. And it took 172 experiments to get to this point, so I'm about $20K in the hole on this project. 😬

vikhyat avatar Jan 30 '24 08:01 vikhyat

@vikhyat I hear ya! I have 4x 4090s too, but split over 2 machines, maybe doable? I'm not a CUDA guy though, so I would just train 4x longer 🤣 I would be interested it repeating your work, but with TinyLlama, as a side project.

dnhkng avatar Jan 30 '24 09:01 dnhkng

@vikhyat I hear ya! I have 4x 4090s too, but split over 2 machines, maybe doable? I'm not a CUDA guy though, so I would just train 4x longer 🤣

I would be interested it repeating your work, but with TinyLlama, as a side project.

I would love to help on this project, are you looking for people ?

julien-blanchon avatar Jan 31 '24 08:01 julien-blanchon

I can donate my 1080 Ti for this if needed. Willing to help in any way. seems like a great project!

GiladLeef avatar Jan 31 '24 15:01 GiladLeef

why not run on the cloud? have 1x3090 but realised my investment was more about convenience than cost efficency

sujitvasanth avatar Feb 01 '24 02:02 sujitvasanth

Hi all, I can report that after a few optimizations (float16 usage instead of float32 is the main one, but few minor others) I was able to run this model really fast on a GTX 1080 Ti with just about 4.6gb VRAM usage. This model truly runs anywhere!

@vikhyat I'd like to help with this project development. When do you think you'll be ready to publish the training code?

GiladLeef avatar Feb 06 '24 07:02 GiladLeef