FastChat icon indicating copy to clipboard operation
FastChat copied to clipboard

Better device_map and max_memory for loading vicuna model

Open sleeepeer opened this issue 10 months ago • 0 comments

Hi! Thank you for this wonderful repo. When I was trying to load vicuna model with limited VRAM across different GPUs. I discovered that your "max_memory" part would cause the loading to fail: in model_adapter.py line 219 to 231.

        if num_gpus != 1:
            kwargs["device_map"] = "auto"
            if max_gpu_memory is None:
                kwargs[
                    "device_map"
                ] = "sequential"  # This is important for not the same VRAM sizes
                available_gpu_memory = get_gpu_memory(num_gpus)
                kwargs["max_memory"] = {
                    i: str(int(available_gpu_memory[i] * 0.85)) + "GiB"
                    for i in range(num_gpus)
                }
            else:
                kwargs["max_memory"] = {i: max_gpu_memory for i in range(num_gpus)}

If comment "available_gpu_memory" and "kwargs["max_memory"]", the loading would succeed. I wonder why you compute max_memory here and use it as one of kwargs to load the model, since this could cause the failure of loading.

Thanks again for your time and effort!

sleeepeer avatar Apr 02 '24 13:04 sleeepeer