Ozan Ciga comments

Results 4 comments of


                                            Ozan Ciga

[BUG] Resume from checkpoint Out of memory error (SIGTERM: Killed) for a large model

to add to this, every time i save a checkpoint with optimizer states, i notice an increase in ram use. at a certain point, oom and the script crashes. as...

[BUG] Resume from checkpoint Out of memory error (SIGTERM: Killed) for a large model

hey @tjruwase certainly, here it is below. also the OOM happens at RAM not VRAM (verified with htop). regarding the code, it is a pretty standard transformers trainer boilerplate, also...

change english text_encoder to other language?

hey @LiJunnan1992 what is your opinion on using adapters to update the pretrained model? or something like low rank adaptation (lora)? wondering if such partial training can be applied to...

Is there support for loading a sharded gguf file ?

i would appreciate this feature! currently, i do something very hacky like below and it works (once all files are downloaded), but it's pretty clunky ```python from llama_cpp import Llama...