Lazy3valuation comments

Results 5 comments of


                                            Lazy3valuation

About memory missing location information

From the readme: "Can train 'infinite' context -- check train.gemma.infini.noclm.1Mseq.sh with 1x H100 80G (with AdamW optimizer, No gradient checkpointing)". However I can train it with 12GB with 8b quantization...

Model generating random sequence

No, sadly not: trying to train with your code will make my gpu run out of memory, and trying to run it with LoRA will break the model, printing (under...

Model generating random sequence

Playing around I managed to stop getting the "inf" error: other than add "modules_to_save" in the LoRA config, I was loading the model in fp16: turning it back to "torch_dtype="auto""...

Model generating random sequence

After about 40 mins of training with 4b precision and 600 block_size (I can't train the model with 8b precision, max block size before going out of memory was 15)...

Model generating random sequence

I'll try, but I'm still studying deep learning and transformers models, not sure I will make it work. Any chance to release a trained model with 1mln context? 👀