haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Support M1 GPU in FARMReader

Open mathislucka opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe. Since haystack v1.6 we have support for pytorch 1.12 which also means support for the M1 GPU. However, we currently initialize the device to be either cpu or cuda depending on availability and if the user passes in the use_gpu=True parameter. For GPU use on the M1, pytorch actually uses the mps backend. See: https://pytorch.org/docs/stable/notes/mps.html

If we could allow the users to pass in the actual device into the FARMReader then this might support of GPU training and inference on the M1 possible.

Describe the solution you'd like Allow the user to pass in devices=[<device>] into FARMReader.__init__ and use these devices in initialize_device_settings. We could make this non-breaking by making this an optional argument to the reader init and the device initialization.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

mathislucka avatar Jul 15 '22 12:07 mathislucka

It is actually already there :D

mathislucka avatar Jul 15 '22 12:07 mathislucka

Reopening this, as the device is not used for the inferencer. See: https://github.com/deepset-ai/haystack/blob/632cd1c141a8b485c6ef8695685d2d8eef3ca50f/haystack/modeling/infer.py#L229

mathislucka avatar Jul 15 '22 14:07 mathislucka

Additionally, currently transformers does not support pytorch 1.12 (see https://github.com/huggingface/transformers/issues/17971#issuecomment-1172324921). When changing the code in inferencer to pass on the mps device. An error is raised during prediction:

Inferencing Samples:   0%|          | 0/1 [00:00<?, ? Batches/s]
Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/infer.py", line 520, in _get_predictions_and_aggregate
    logits = self.model.forward(**batch)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/model/adaptive_model.py", line 477, in forward
    output_tuple = self.language_model.forward(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/model/language_model.py", line 700, in forward
    output_tuple = self.model(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 841, in forward
    embedding_output = self.embeddings(
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 105, in forward
    position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx, past_key_values_length)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 1574, in create_position_ids_from_input_ids
    incremental_indices = (torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask
NotImplementedError: The operator 'aten::cumsum.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
python-BaseException

mathislucka avatar Jul 15 '22 14:07 mathislucka

Also see this for the current state of covered ops for the mps backend:

https://github.com/pytorch/pytorch/issues/77764

mathislucka avatar Jul 15 '22 14:07 mathislucka

Hey,

Thanks for sharing this information! I am new to haystack and wondering how to enable GPU in Mac Pro M1? I have PyTorch set up already with torch.backends.mps.is_available() = True. However, I still don't know how to activate it. Can you provide a bit more information?

Best

yli223 avatar Jul 19 '22 20:07 yli223

Hey, @yli223 we do not currently support the M1 GPU. We would need to implement the changes explained by @mathislucka above in Haystack. In addition, we need also need to wait for HuggingFace transformers to support PyTorch 1.12 which is required for the M1 GPU to work (more info here https://github.com/huggingface/transformers/pull/17925).

sjrl avatar Jul 22 '22 06:07 sjrl

Update: the HF PR has been merged to main. Therefore, we can use this feature as soon as we support HF v4.21.2 release (as soon as it gets released). Do we need to add devices optional parameter anywhere else except infer.py @mathislucka @sjrl ?

vblagoje avatar Aug 18 '22 11:08 vblagoje

That's great! I would say that anywhere the user passes an option to initialize_device_settings should have the option of passing a list of devices instead. Similar to what is already done in this load function for the Inferencer https://github.com/deepset-ai/haystack/blob/be127e5b61e60f59292a1e5d73676eb34691f668/haystack/modeling/infer.py#L175-L176

where devices is of type https://github.com/deepset-ai/haystack/blob/be127e5b61e60f59292a1e5d73676eb34691f668/haystack/modeling/infer.py#L128

So what is inconsistent at the moment is that the devices option is only supported in some places in Haystack. And I think we should support it everywhere where the user can pass in the use_gpu boolean.

sjrl avatar Aug 18 '22 11:08 sjrl

@sjrl, so what you are saying is that every function, including the component constructor where we currently pass use_gpu should have devices as an optional argument. And second, we should make sure that the deterministic approach to device selection defined in initialize_device_settings, is used in every case where we pass the devices parameter. Correct?

vblagoje avatar Aug 18 '22 12:08 vblagoje

so what you are saying is that every function, including the component constructor where we currently pass use_gpu should have devices as an optional argument.

Yes I think this makes sense to help standardize how devices are specified in Haystack.

And second, we should make sure that the deterministic approach to device selection defined in initialize_device_settings, is used in every case where we pass the devices parameter. Correct?

I'm not entirely sure what you mean here. Do you mean we should always use this statement everywhere we have added the devices optional parameter?

 if devices is None: 
     devices, n_gpu = initialize_device_settings(use_cuda=gpu, multi_gpu=False) 

sjrl avatar Aug 18 '22 13:08 sjrl

Yes, it seems to be already used everywhere, but we should make sure that it does get used in addition to making sure we provide devices parameter.

vblagoje avatar Aug 18 '22 13:08 vblagoje

Yes, it seems to be already used everywhere, but we should make sure that it does get used in addition to making sure we provide devices parameter.

Yes I agree.

sjrl avatar Aug 18 '22 13:08 sjrl

Update: although HF has recently added support for devices in pipelines the main blocker for the Haystack deployment on Apple Silicone M1/M2 remains MPS implementation of torch cumsum operator which is used extensively in all HF models.

vblagoje avatar Aug 30 '22 08:08 vblagoje

However, seq2seq generative models still don't work (whenever GenerationMixin is used). The error is

NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

So now we have to wait for https://github.com/pytorch/pytorch/issues/86806

vblagoje avatar Nov 28 '22 13:11 vblagoje

Hi @vblagoje, the blocking issue has been fixed. May I ask what the current status for M1 GPU support is? At least from the documentation, it didn't mention Apple Silicon support, so I suppose it's still not supported:
https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration

laike9m avatar Oct 10 '23 07:10 laike9m

@laike9m haven't tried it in a while tbh. Having looked at https://github.com/pytorch/pytorch/issues/86806 it seems like it should work now. Please try it out and let us know. If not, I'll get to this task next week or so

vblagoje avatar Oct 10 '23 11:10 vblagoje

Thanks. I can give it a try, where I can find the instructions to enable it? (sorry I'm pretty new to haystack)

laike9m avatar Oct 10 '23 18:10 laike9m

Still getting the error: NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.

Running MacOS Sonoma 14.2.1 (23C71)

I have PyTorch 2.1.2

lvdinergy avatar Dec 21 '23 08:12 lvdinergy