Lazy3valuation

Results 5 comments of Lazy3valuation

From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization...

No, sadly not: trying to train with your code will make my gpu run out of memory, and trying to run it with LoRA will break the model, printing (under...

Playing around I managed to stop getting the "inf" error: other than add "modules_to_save" in the LoRA config, I was loading the model in fp16: turning it back to "torch_dtype="auto""...

After about 40 mins of training with 4b precision and 600 block_size (I can't train the model with 8b precision, max block size before going out of memory was 15)...

I'll try, but I'm still studying deep learning and transformers models, not sure I will make it work. Any chance to release a trained model with 1mln context? 👀