haystack
haystack copied to clipboard
Support M1 GPU in FARMReader
Is your feature request related to a problem? Please describe.
Since haystack v1.6 we have support for pytorch 1.12 which also means support for the M1 GPU. However, we currently initialize the device to be either cpu
or cuda
depending on availability and if the user passes in the use_gpu=True
parameter. For GPU use on the M1, pytorch actually uses the mps
backend. See: https://pytorch.org/docs/stable/notes/mps.html
If we could allow the users to pass in the actual device into the FARMReader then this might support of GPU training and inference on the M1 possible.
Describe the solution you'd like
Allow the user to pass in devices=[<device>]
into FARMReader.__init__
and use these devices in initialize_device_settings
. We could make this non-breaking by making this an optional argument to the reader init and the device initialization.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
It is actually already there :D
Reopening this, as the device is not used for the inferencer. See: https://github.com/deepset-ai/haystack/blob/632cd1c141a8b485c6ef8695685d2d8eef3ca50f/haystack/modeling/infer.py#L229
Additionally, currently transformers does not support pytorch 1.12 (see https://github.com/huggingface/transformers/issues/17971#issuecomment-1172324921). When changing the code in inferencer to pass on the mps
device. An error is raised during prediction:
Inferencing Samples: 0%| | 0/1 [00:00<?, ? Batches/s]
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/infer.py", line 520, in _get_predictions_and_aggregate
logits = self.model.forward(**batch)
File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/model/adaptive_model.py", line 477, in forward
output_tuple = self.language_model.forward(
File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/haystack/modeling/model/language_model.py", line 700, in forward
output_tuple = self.model(
File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 841, in forward
embedding_output = self.embeddings(
File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 105, in forward
position_ids = create_position_ids_from_input_ids(input_ids, self.padding_idx, past_key_values_length)
File "/opt/homebrew/Caskroom/miniforge/base/envs/fanal/lib/python3.9/site-packages/transformers/models/roberta/modeling_roberta.py", line 1574, in create_position_ids_from_input_ids
incremental_indices = (torch.cumsum(mask, dim=1).type_as(mask) + past_key_values_length) * mask
NotImplementedError: The operator 'aten::cumsum.out' is not current implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
python-BaseException
Also see this for the current state of covered ops for the mps backend:
https://github.com/pytorch/pytorch/issues/77764
Hey,
Thanks for sharing this information! I am new to haystack and wondering how to enable GPU in Mac Pro M1? I have PyTorch set up already with torch.backends.mps.is_available() = True. However, I still don't know how to activate it. Can you provide a bit more information?
Best
Hey, @yli223 we do not currently support the M1 GPU. We would need to implement the changes explained by @mathislucka above in Haystack. In addition, we need also need to wait for HuggingFace transformers to support PyTorch 1.12 which is required for the M1 GPU to work (more info here https://github.com/huggingface/transformers/pull/17925).
Update: the HF PR has been merged to main. Therefore, we can use this feature as soon as we support HF v4.21.2 release (as soon as it gets released). Do we need to add devices
optional parameter anywhere else except infer.py @mathislucka @sjrl ?
That's great! I would say that anywhere the user passes an option to initialize_device_settings
should have the option of passing a list of devices
instead. Similar to what is already done in this load function for the Inferencer
https://github.com/deepset-ai/haystack/blob/be127e5b61e60f59292a1e5d73676eb34691f668/haystack/modeling/infer.py#L175-L176
where devices
is of type
https://github.com/deepset-ai/haystack/blob/be127e5b61e60f59292a1e5d73676eb34691f668/haystack/modeling/infer.py#L128
So what is inconsistent at the moment is that the devices
option is only supported in some places in Haystack. And I think we should support it everywhere where the user can pass in the use_gpu
boolean.
@sjrl, so what you are saying is that every function, including the component constructor where we currently pass use_gpu
should have devices
as an optional argument. And second, we should make sure that the deterministic approach to device selection defined in initialize_device_settings
, is used in every case where we pass the devices parameter. Correct?
so what you are saying is that every function, including the component constructor where we currently pass
use_gpu
should have devices as an optional argument.
Yes I think this makes sense to help standardize how devices are specified in Haystack.
And second, we should make sure that the deterministic approach to device selection defined in initialize_device_settings, is used in every case where we pass the devices parameter. Correct?
I'm not entirely sure what you mean here. Do you mean we should always use this statement everywhere we have added the devices
optional parameter?
if devices is None:
devices, n_gpu = initialize_device_settings(use_cuda=gpu, multi_gpu=False)
Yes, it seems to be already used everywhere, but we should make sure that it does get used in addition to making sure we provide devices
parameter.
Yes, it seems to be already used everywhere, but we should make sure that it does get used in addition to making sure we provide devices parameter.
Yes I agree.
Update: although HF has recently added support for devices in pipelines the main blocker for the Haystack deployment on Apple Silicone M1/M2 remains MPS implementation of torch cumsum operator which is used extensively in all HF models.
However, seq2seq generative models still don't work (whenever GenerationMixin is used). The error is
NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable `PYTORCH_ENABLE_MPS_FALLBACK=1` to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
So now we have to wait for https://github.com/pytorch/pytorch/issues/86806
Hi @vblagoje, the blocking issue has been fixed. May I ask what the current status for M1 GPU support is? At least from the documentation, it didn't mention Apple Silicon support, so I suppose it's still not supported:
https://docs.haystack.deepset.ai/docs/enabling-gpu-acceleration
@laike9m haven't tried it in a while tbh. Having looked at https://github.com/pytorch/pytorch/issues/86806 it seems like it should work now. Please try it out and let us know. If not, I'll get to this task next week or so
Thanks. I can give it a try, where I can find the instructions to enable it? (sorry I'm pretty new to haystack)
Still getting the error:
NotImplementedError: The operator 'aten::remainder.Tensor_out' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on https://github.com/pytorch/pytorch/issues/77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1
to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Running MacOS Sonoma 14.2.1 (23C71)
I have PyTorch 2.1.2