trafficstars

Description

As discussed in this issue https://github.com/roboflow/maestro/issues/176, this PR implements the device map feature for loading all 3 models. No change in dependencies is required.

The 'device' hyperparameter was replaced by 'device map' to maintain consistency with huggingface and avoid confusion. It was also ensured in the Florence 2 model that the device map does not take in a dict input, eg: {"": "cuda:0"} and 'auto' directly assigns the device to an available device based on the already existing parse_device_spec() function.

For Qwen 2.5 and PaliGemma 2, the device map is directly passed to the loading of the models (from_pretrained), with the default set to 'auto'.

The docstring for the load_model() function for all 3 model checkpoints was updated to reflect the changes.

Type of change

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)

Testing

Tested loading each model setting device map to different modes - 'auto', 'cuda', 'cpu'. In a cloud environment passing the cases.

I have read the CLA Document and I sign the CLA.

Mar 01 '25 15:03 AmazingK2k3

All committers have signed the CLA.

Mar 01 '25 15:03 CLAassistant

Hi @AmazingK2k3 👋🏻 thank you so much for your PR. Could you please explain why you decided to drop the device argument? I'm looking at the https://github.com/roboflow/maestro/issues/176 issue and if I remember correctly, we wanted to keep the device argument and add device_map allowing for:

Load on CPU

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="cpu"
)

Load on MPS

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="mps"
)

Load on single GPU machine

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device="cuda:0"
)

Load model on all GPUs

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="auto"
)

Load model on specific subset of GPUs

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map={"": "cuda:0"}
)

I think just device_map won't allow us for the same level of flexibility.

Mar 06 '25 17:03 SkalskiP

Hey @SkalskiP, The main reason I dropped the device argument completely is that I felt having two arguments that dealt with handling devices device and device_map might confuse the user loading the model. For example, currently, if we have both device and device map, say the user is loading the qwen model, the user could set the device = cpu but leave the device_map setting it to None or auto,

  model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
            model_id_or_path,
            revision=revision,
            trust_remote_code=True,
            device_map=device_map if device_map else "auto",
            torch_dtype=torch.bfloat16,
            cache_dir=cache_dir,
        )
        model.to(device)

This will ultimately load the model across GPUs even if a specific device is requested, as stated in issue #176.

I felt it would be much simpler to have a single argument dealing with the devices. device_map is commonly used in the transformers library as well and can directly take in cpu, mps, and cuda:0 that device can take in and load the models accordingly. If It is left None, the models will be loaded with the device_map set as auto.

Load on CPU

processor, model = load_model(
   model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
   device_map="cpu"
)

Load on MPS

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
     device_map="mps"
 )

Load on single GPU machine

processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="cuda:0"
)

Load model on all GPUs

 processor, model = load_model(
    model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
    device_map="auto"
)

Load model on a specific subset of GPUs

processor, model = load_model(
   model_id_or_path="Qwen/Qwen2.5-VL-7B-Instruct",
   device_map={"": "cuda:0"}  # Not applicable to Florence 2
)

Just one argument device_map for all cases!

Let me know if this is okay or there is a better way to go about it

Mar 06 '25 18:03 AmazingK2k3

maestro
maestro copied to clipboard

Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2

Description

Type of change

Testing

maestro maestro copied to clipboard

Device map feature for maestro models -qwen_2.5, florence_2 & paligemma_2

Description

Type of change

Testing

maestro
maestro copied to clipboard