maestro icon indicating copy to clipboard operation
maestro copied to clipboard

Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2

Open AmazingK2k3 opened this issue 8 months ago • 3 comments
trafficstars

Description

As discussed in this issue https://github.com/roboflow/maestro/issues/176, this PR implements the device map feature for loading all 3 models. No change in dependencies is required.

The 'device' hyperparameter was replaced by 'device map' to maintain consistency with huggingface and avoid confusion. It was also ensured in the Florence 2 model that the device map does not take in a dict input, eg: {"": "cuda:0"} and 'auto' directly assigns the device to an available device based on the already existing parse_device_spec() function.

For Qwen 2.5 and PaliGemma 2, the device map is directly passed to the loading of the models (from_pretrained), with the default set to 'auto'.

The docstring for the load_model() function for all 3 model checkpoints was updated to reflect the changes.

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)

Testing

Tested loading each model setting device map to different modes - 'auto', 'cuda', 'cpu'. In a cloud environment passing the cases.

I have read the CLA Document and I sign the CLA.

AmazingK2k3 avatar Mar 01 '25 15:03 AmazingK2k3

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Mar 01 '25 15:03 CLAassistant

Hi @AmazingK2k3 👋🏻 thank you so much for your PR. Could you please explain why you decided to drop the device argument? I'm looking at the https://github.com/roboflow/maestro/issues/176 issue and if I remember correctly, we wanted to keep the device argument and add device_map allowing for:

  • Load on CPU
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="cpu"
)
  • Load on MPS
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="mps"
)
  • Load on single GPU machine
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="cuda:0"
)
  • Load model on all GPUs
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="auto"
)
  • Load model on specific subset of GPUs
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map={"": "cuda:0"}
)

I think just device_map won't allow us for the same level of flexibility.

SkalskiP avatar Mar 06 '25 17:03 SkalskiP

Hey @SkalskiP, The main reason I dropped the device argument completely is that I felt having two arguments that dealt with handling devices device and device_map might confuse the user loading the model. For example, currently, if we have both device and device map, say the user is loading the qwen model, the user could set the device = cpu but leave the device_map setting it to None or auto,

  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_id_or_path,
            revision=revision,
            trust_remote_code=True,
            device_map=device_map if device_map else "auto",
            torch_dtype=torch.bfloat16,
            cache_dir=cache_dir,
        )
        model.to(device)

This will ultimately load the model across GPUs even if a specific device is requested, as stated in issue #176.

I felt it would be much simpler to have a single argument dealing with the devices. device_map is commonly used in the transformers library as well and can directly take in cpu, mps, and cuda:0 that device can take in and load the models accordingly. If It is left None, the models will be loaded with the device_map set as auto.

  • Load on CPU
processor, model = load_model(
   model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
   device_map="cpu"
)
  • Load on MPS
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
     device_map="mps"
 )
  • Load on single GPU machine
processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="cuda:0"
)
  • Load model on all GPUs
 processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="auto"
)
  • Load model on a specific subset of GPUs
processor, model = load_model(
   model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
   device_map={"": "cuda:0"}  # Not applicable to Florence 2
)

Just one argument device_map for all cases!

Let me know if this is okay or there is a better way to go about it

AmazingK2k3 avatar Mar 06 '25 18:03 AmazingK2k3