moondream Training Code

Hi there,

Great work. I wanted to enquire if we can train the model on custom dataset and if the training code be made available.

Regards,

Jan 24 '24 04:01 neurlnetworker

hey there - thanks! i’m not ready to release the training code yet but can potentially run the training job for you, happy to chat if you’re interested in that.

Jan 24 '24 07:01 vikhyat

Just wondering, what if pretrain model on the entire danbooru or gelbooru dataset? There is a lot of useful information there in the form of tags

Jan 24 '24 13:01 Muinez

@Muinez that's a great idea, it has a very rich collection of tags

Jan 24 '24 23:01 vikhyat

@vikhyat what was the training system and cost? Is this expensive to train? Seems like a cool project to try myself, but I'm worried about cost 💸

Jan 30 '24 05:01 dnhkng

Each training run currently takes 20 hours on 4x4090, but it was 4x that before I wrote a bunch of custom CUDA kernels to speed up training. And it took 172 experiments to get to this point, so I'm about $20K in the hole on this project. 😬

Jan 30 '24 08:01 vikhyat

@vikhyat I hear ya! I have 4x 4090s too, but split over 2 machines, maybe doable? I'm not a CUDA guy though, so I would just train 4x longer 🤣 I would be interested it repeating your work, but with TinyLlama, as a side project.

Jan 30 '24 09:01 dnhkng

@vikhyat I hear ya! I have 4x 4090s too, but split over 2 machines, maybe doable? I'm not a CUDA guy though, so I would just train 4x longer 🤣

I would be interested it repeating your work, but with TinyLlama, as a side project.

I would love to help on this project, are you looking for people ?

Jan 31 '24 08:01 julien-blanchon

I can donate my 1080 Ti for this if needed. Willing to help in any way. seems like a great project!

Jan 31 '24 15:01 GiladLeef

why not run on the cloud? have 1x3090 but realised my investment was more about convenience than cost efficency

Feb 01 '24 02:02 sujitvasanth

Hi all, I can report that after a few optimizations (float16 usage instead of float32 is the main one, but few minor others) I was able to run this model really fast on a GTX 1080 Ti with just about 4.6gb VRAM usage. This model truly runs anywhere!

@vikhyat I'd like to help with this project development. When do you think you'll be ready to publish the training code?

Feb 06 '24 07:02 GiladLeef