Jack BAI

Results 37 comments of Jack BAI

Same problem, I'm using multi-image multi-turn generation with huggingface, would appreciate any help here! All model types (4b, 12b, 27b) have this problem. But this problem is random and contingent...

@FredrikNoren did you try using the gemma repo instead of HF? Is it also the case for the original repo or is it only a problem with HF?

I just tried this: ``` self.model = Gemma3ForConditionalGeneration.from_pretrained(..., attn_implementation="eager" ).eval() ``` maybe you can also try manual attn implementation (`eager`) first

Also encountered this error. The saved checkpoint is quite small and not at all usable.

Dear @tjruwase , thanks, I will try using destroy method. I meant exactly what you describe. Basically we launch the `.py` file with `deepspeed`, and within this `.py` file I...

Dear @tjruwase, It seems like the CPU memory was not freed after destroying the model engine under ZeRO-3 - the GPU memory is freed though. Below is the minimum reproduction...

For a bit more context - I have to destroy the engine in each epoch because I will need to run vllm after each epoch, which is omitted in the...

Edit: the problem happens on 12b and 27b gemma-3, but not 4b. **Also when inferencing the updated 4b model, the inference speed gets extremely slow when batch size > 32.**...

Sure, eg this is the saved 12b config: ``` { "architectures": [ "Gemma3ForConditionalGeneration" ], "boi_token_index": 255999, "eoi_token_index": 256000, "eos_token_id": [ 1, 106 ], "image_token_index": 262144, "initializer_range": 0.02, "mm_tokens_per_image": 256, "model_type":...