segment-anything
segment-anything copied to clipboard
Added changes for MPS
This PR adds support for running the model using the MPS backend on MacOS devices.
As per README, having pytorch>=2.0
is strongly recommended as it comes with a wider support for various operations on MPS. Also torchvision::nms
is still not supported by MPS, so one has to use PYTORCH_ENABLE_MPS_FALLBACK=1
to enable a fallback on CPU.
Tested on M1 using vit_l
model. Runtime of the amg script:
- with
device=cpu
~124s - with
device=mps
~73s
Resolves the following issues:
https://github.com/facebookresearch/segment-anything/issues/119 https://github.com/facebookresearch/segment-anything/issues/94
Hi @DrSleep!
Thank you for your pull request and welcome to our community.
Action Required
In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.
Process
In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.
Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed
. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.
If you have received this in error or have any questions, please contact us at [email protected]. Thanks!
It works but is it expected to get worse results?
MPS on macbook m1:

Cuda (Nvidia Tesla A30):
It works but is it expected to get worse results?
MPS on macbook m1:
![]()
Cuda (Nvidia Tesla A30):
MPS support in PyTorch is still experimental, so I think the differences are due to that, however, I must admit I have not looked in detail what operations cause difference in this particular model (I did print some statistics of intermediate activations and they were comparable b/w CPU and MPS). I did first try using PyTorch 1.12.0 and it required many more code changes, which in the end still resulted in no masks being generated at all. PyTorch 2.0 came with more MPS fixes, so I imagine it would only get better with time.
I still get a segfault when I run the amg.py script:
python scripts/amg.py --input './notebooks/images/truck.jpg' --output ./out --checkpoint models/sam_vit_l_0b3195.pth --model-type vit_l --device 'mps'
Loading model...
Processing './notebooks/images/truck.jpg'...
[1] 97607 segmentation fault python scripts/amg.py --input './notebooks/images/truck.jpg' --output ./out
@KevinColemanInc I was getting a segfault but was able to fix it after
pip install torch==2.0.0 torchvision==0.15.0
I'm not sure if this works yet, or maybe I'm dense. On a Mac M1 Ultra Studio
import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"]="1"
import torch
from segment_anything import sam_model_registry, SamPredictor, SamAutomaticMaskGenerator
sam = sam_model_registry["vit_h"](checkpoint="~/GitHub/segment-anything/models/sam_vit_h_4b8939.pth")
import cv2
import urllib.request
import numpy as np
url = "https://upload.wikimedia.org/wikipedia/commons/e/e7/Everest_North_Face_toward_Base_Camp_Tibet_Luca_Galuzzi_2006.jpg"
req = urllib.request.urlopen(url)
arr = np.asarray(bytearray(req.read()), dtype=np.uint8)
image = cv2.cvtColor(cv2.imdecode(arr, -1), cv2.COLOR_BGR2RGB)
#general code, returns "mps" on my system
device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps else "cpu"
sam.to(device=device)
mask_generator = SamAutomaticMaskGenerator(sam)
result = mask_generator.generate(image)
Which then returns:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/segment_anything/automatic_mask_generator.py", line 163, in generate
mask_data = self._generate_masks(image)
File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/segment_anything/automatic_mask_generator.py", line 206, in _generate_masks
crop_data = self._process_crop(image, crop_box, layer_idx, orig_size)
File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/segment_anything/automatic_mask_generator.py", line 245, in _process_crop
batch_data = self._process_batch(points, cropped_im_size, crop_box, orig_size)
File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/segment_anything/automatic_mask_generator.py", line 277, in _process_batch
in_points = torch.as_tensor(transformed_points, device=self.predictor.device)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
NVM, I am dense. Gotta install this specific branch, as main doesn't work. You can do so this way:
python -m pip install 'segment-anything @ git+https://github.com/DrSleep/segment-anything@cd507390ca9591951d0bfff2723d1f6be8792bb8'
This PR adds support for running the model using the MPS backend on MacOS devices.
yes, i install this specific branch Added changes for MPS #122
but this code will fall back to run on the CPU, even slower than cpu. On a MacBook M1Pro
1、this is my code:
import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"]="1"
import time
import cv2
import torch
import torchvision
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
print("PyTorch version:", torch.__version__)
print("Torchvision version:", torchvision.__version__)
print("CUDA is available:", torch.cuda.is_available())
tt = time.time()
sam_checkpoint = "sam_vit_l_0b3195.pth"
model_type = "vit_l"
device = "mps"
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)
tt = time.time()
mask_generator = SamAutomaticMaskGenerator(sam)
image = cv2.imread('dog.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
result = mask_generator.generate(image)
print("result", time.time() - tt, len(result))
2、this is the result log:
PyTorch version: 2.0.0
Torchvision version: 0.15.0
CUDA is available: False
/Users/felix/kuangkuang/gitkavin/segment-anything-mps-support/segment_anything/modeling/mask_decoder.py:126:
UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at
/Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
src = torch.repeat_interleave(image_embeddings, tokens.shape[0], dim=0)
[W MPSFallback.mm:11] Warning: The operator 'torchvision::nms' is not currently supported on the MPS backend and
will fall back to run on the CPU. This may have performance implications. (function operator())
result 65.58419871330261 66
3、if I change device = "cpu"
this is the result log:
PyTorch version: 2.0.0
Torchvision version: 0.15.0
CUDA is available: False
result 33.75937104225159 66
3、if I change env to new version
this is the error log:
PyTorch version: 2.1.0.dev20230512
Torchvision version: 0.16.0.dev20230512
CUDA is available: False
Segmentation fault: 11
How to solve this problem, I want run segment-anything in MPS
but if I use "predictor.predict", it can run in mps
tt = time.time()
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)
point_coords = [[200.0, 450.0]]
point_labels = [1]
img = load_img_to_array("./example/remove-anything/dog.jpg")
point_coords = np.array(point_coords)
point_labels = np.array(point_labels)
predictor = SamPredictor(sam)
predictor.set_image(img)
masks, scores, logits = predictor.predict(
point_coords=point_coords,
point_labels=point_labels,
multimask_output=True,
)
masks = masks.astype(np.uint8) * 255
print("result", time.time() - tt, masks.shape)
Do we have a final solution for this in the meantime of getting segment-anything to run using MPS?
The following was performed on an M3 max 128gb macbook pro with pytorch 2.3 and torchvision 0.18. I set device to 'mps'. I used the vit_h checkpoint.
The following preview times are for my 2nd attempt. The time to generate the masks were 17s and 43s respectively on the 2nd go. On the first go, not represented in the images, it took 33s and 46s respectively so it went quite a bit faster the 2nd time. Both times however, the amount of masks generated were 66 and 90 respectively.
So, though m1 might not produce the best masks, it is possible to get high quality masks if one upgrades their equipment.