André Bauer

Results 3 comments of André Bauer

+1 for this feature , is there any way to get the desired behavior already?

> In the config json, set "stage3_prefetch_bucket_size": 0, that should work While this might "work" this still not solves the problem for example with `mixtral`, since this kind of MoE...

> I had some success loading the model this way: > > ``` > with deepspeed.OnDevice(dtype=dtype, device="meta"): > model = AutoModelForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True) > model = deepspeed.init_inference( > model, > tensor_parallel...