Jack BAI
Jack BAI
Same problem, I'm using multi-image multi-turn generation with huggingface, would appreciate any help here! All model types (4b, 12b, 27b) have this problem. But this problem is random and contingent...
@FredrikNoren did you try using the gemma repo instead of HF? Is it also the case for the original repo or is it only a problem with HF?
I just tried this: ``` self.model = Gemma3ForConditionalGeneration.from_pretrained(..., attn_implementation="eager" ).eval() ``` maybe you can also try manual attn implementation (`eager`) first
Also encountered this error. The saved checkpoint is quite small and not at all usable.
Dear @tjruwase , thanks, I will try using destroy method. I meant exactly what you describe. Basically we launch the `.py` file with `deepspeed`, and within this `.py` file I...
Dear @tjruwase, It seems like the CPU memory was not freed after destroying the model engine under ZeRO-3 - the GPU memory is freed though. Below is the minimum reproduction...
For a bit more context - I have to destroy the engine in each epoch because I will need to run vllm after each epoch, which is omitted in the...
Edit: the problem happens on 12b and 27b gemma-3, but not 4b. **Also when inferencing the updated 4b model, the inference speed gets extremely slow when batch size > 32.**...
Sure, eg this is the saved 12b config: ``` { "architectures": [ "Gemma3ForConditionalGeneration" ], "boi_token_index": 255999, "eoi_token_index": 256000, "eos_token_id": [ 1, 106 ], "image_token_index": 262144, "initializer_range": 0.02, "mm_tokens_per_image": 256, "model_type":...
Yes this is 12b. 12b also does not work with save/reload.