FastChat
FastChat copied to clipboard
Better device_map and max_memory for loading vicuna model
Hi! Thank you for this wonderful repo. When I was trying to load vicuna model with limited VRAM across different GPUs. I discovered that your "max_memory" part would cause the loading to fail: in model_adapter.py line 219 to 231.
if num_gpus != 1:
kwargs["device_map"] = "auto"
if max_gpu_memory is None:
kwargs[
"device_map"
] = "sequential" # This is important for not the same VRAM sizes
available_gpu_memory = get_gpu_memory(num_gpus)
kwargs["max_memory"] = {
i: str(int(available_gpu_memory[i] * 0.85)) + "GiB"
for i in range(num_gpus)
}
else:
kwargs["max_memory"] = {i: max_gpu_memory for i in range(num_gpus)}
If comment "available_gpu_memory" and "kwargs["max_memory"]", the loading would succeed. I wonder why you compute max_memory here and use it as one of kwargs to load the model, since this could cause the failure of loading.
Thanks again for your time and effort!