This PR adds support for running the model using the MPS backend on MacOS devices.

As per README, having pytorch>=2.0 is strongly recommended as it comes with a wider support for various operations on MPS. Also torchvision::nms is still not supported by MPS, so one has to use PYTORCH_ENABLE_MPS_FALLBACK=1 to enable a fallback on CPU.

Tested on M1 using vit_l model. Runtime of the amg script:

with device=cpu ~124s
with device=mps ~73s

Resolves the following issues:

https://github.com/facebookresearch/segment-anything/issues/119 https://github.com/facebookresearch/segment-anything/issues/94

Apr 09 '23 01:04 DrSleep

Hi @DrSleep!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Apr 09 '23 01:04 facebook-github-bot

It works but is it expected to get worse results?

MPS on macbook m1:

Cuda (Nvidia Tesla A30):

Apr 10 '23 15:04 fungiboletus

It works but is it expected to get worse results?

MPS on macbook m1:

Cuda (Nvidia Tesla A30):

MPS support in PyTorch is still experimental, so I think the differences are due to that, however, I must admit I have not looked in detail what operations cause difference in this particular model (I did print some statistics of intermediate activations and they were comparable b/w CPU and MPS). I did first try using PyTorch 1.12.0 and it required many more code changes, which in the end still resulted in no masks being generated at all. PyTorch 2.0 came with more MPS fixes, so I imagine it would only get better with time.

Apr 11 '23 03:04 DrSleep

I still get a segfault when I run the amg.py script:

python scripts/amg.py --input './notebooks/images/truck.jpg' --output ./out --checkpoint models/sam_vit_l_0b3195.pth --model-type vit_l --device 'mps'

Loading model...
Processing './notebooks/images/truck.jpg'...
[1]    97607 segmentation fault  python scripts/amg.py --input './notebooks/images/truck.jpg' --output ./out

Apr 15 '23 17:04 KevinColemanInc

@KevinColemanInc I was getting a segfault but was able to fix it after

pip install torch==2.0.0 torchvision==0.15.0

Apr 17 '23 21:04 afloresescarcega

I'm not sure if this works yet, or maybe I'm dense. On a Mac M1 Ultra Studio

import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"]="1"

import torch
from segment_anything import sam_model_registry, SamPredictor, SamAutomaticMaskGenerator
sam = sam_model_registry["vit_h"](checkpoint="~/GitHub/segment-anything/models/sam_vit_h_4b8939.pth")

import cv2
import urllib.request
import numpy as np

url = "https://upload.wikimedia.org/wikipedia/commons/e/e7/Everest_North_Face_toward_Base_Camp_Tibet_Luca_Galuzzi_2006.jpg"
req =  urllib.request.urlopen(url)
arr = np.asarray(bytearray(req.read()), dtype=np.uint8)
image = cv2.cvtColor(cv2.imdecode(arr, -1), cv2.COLOR_BGR2RGB)


#general code, returns "mps" on my system
device = "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps else "cpu"
sam.to(device=device)

mask_generator = SamAutomaticMaskGenerator(sam)
result = mask_generator.generate(image)

Which then returns:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/segment_anything/automatic_mask_generator.py", line 163, in generate
    mask_data = self._generate_masks(image)
  File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/segment_anything/automatic_mask_generator.py", line 206, in _generate_masks
    crop_data = self._process_crop(image, crop_box, layer_idx, orig_size)
  File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/segment_anything/automatic_mask_generator.py", line 245, in _process_crop
    batch_data = self._process_batch(points, cropped_im_size, crop_box, orig_size)
  File "/opt/homebrew/Caskroom/mambaforge/base/lib/python3.10/site-packages/segment_anything/automatic_mask_generator.py", line 277, in _process_batch
    in_points = torch.as_tensor(transformed_points, device=self.predictor.device)
TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

Apr 30 '23 02:04 leedrake5

NVM, I am dense. Gotta install this specific branch, as main doesn't work. You can do so this way:

python -m pip install 'segment-anything @ git+https://github.com/DrSleep/segment-anything@cd507390ca9591951d0bfff2723d1f6be8792bb8'

May 03 '23 22:05 leedrake5

This PR adds support for running the model using the MPS backend on MacOS devices.

yes, i install this specific branch Added changes for MPS #122

but this code will fall back to run on the CPU, even slower than cpu. On a MacBook M1Pro

1、this is my code:

import os
os.environ["PYTORCH_ENABLE_MPS_FALLBACK"]="1"

import time
import cv2
import torch
import torchvision
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
print("PyTorch version:", torch.__version__)
print("Torchvision version:", torchvision.__version__)
print("CUDA is available:", torch.cuda.is_available())

tt = time.time()

sam_checkpoint = "sam_vit_l_0b3195.pth"
model_type = "vit_l"
device = "mps"

sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
sam.to(device=device)

tt = time.time()
mask_generator = SamAutomaticMaskGenerator(sam)

image = cv2.imread('dog.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

result = mask_generator.generate(image)

print("result", time.time() - tt, len(result))

2、this is the result log:

 PyTorch version: 2.0.0
 Torchvision version: 0.15.0
 CUDA is available: False

 /Users/felix/kuangkuang/gitkavin/segment-anything-mps-support/segment_anything/modeling/mask_decoder.py:126: 
 UserWarning: MPS: no support for int64 repeats mask, casting it to int32 (Triggered internally at 
/Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Repeat.mm:236.)
 src = torch.repeat_interleave(image_embeddings, tokens.shape[0], dim=0)
[W MPSFallback.mm:11] Warning: The operator 'torchvision::nms' is not currently supported on the MPS backend and 
 will fall back to run on the CPU. This may have performance implications. (function operator())

result 65.58419871330261 66

3、if I change device = "cpu"

this is the result log:

 PyTorch version: 2.0.0
 Torchvision version: 0.15.0
 CUDA is available: False
 result 33.75937104225159 66

3、if I change env to new version

this is the error log:

PyTorch version: 2.1.0.dev20230512
Torchvision version: 0.16.0.dev20230512
CUDA is available: False
Segmentation fault: 11

How to solve this problem, I want run segment-anything in MPS

but if I use "predictor.predict", it can run in mps

    tt = time.time()
    sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
    sam.to(device=device)

   point_coords = [[200.0, 450.0]]
   point_labels = [1]
   img = load_img_to_array("./example/remove-anything/dog.jpg")

   point_coords = np.array(point_coords)
   point_labels = np.array(point_labels)
   predictor = SamPredictor(sam)
   predictor.set_image(img)
   masks, scores, logits = predictor.predict(
         point_coords=point_coords,
          point_labels=point_labels,
          multimask_output=True,
   )
   masks = masks.astype(np.uint8) * 255
   print("result", time.time() - tt, masks.shape)

Jun 16 '23 15:06 kavinbj

Do we have a final solution for this in the meantime of getting segment-anything to run using MPS?

Nov 23 '23 10:11 sophiamaedler

The following was performed on an M3 max 128gb macbook pro with pytorch 2.3 and torchvision 0.18. I set device to 'mps'. I used the vit_h checkpoint.

The following preview times are for my 2nd attempt. The time to generate the masks were 17s and 43s respectively on the 2nd go. On the first go, not represented in the images, it took 33s and 46s respectively so it went quite a bit faster the 2nd time. Both times however, the amount of masks generated were 66 and 90 respectively.

So, though m1 might not produce the best masks, it is possible to get high quality masks if one upgrades their equipment.

May 06 '24 14:05 ChaseKolozsy

segment-anything segment-anything copied to clipboard

Added changes for MPS

Action Required

Process

MPS on macbook m1:

Cuda (Nvidia Tesla A30):

MPS on macbook m1:

Cuda (Nvidia Tesla A30):

segment-anything
segment-anything copied to clipboard