segment-anything icon indicating copy to clipboard operation
segment-anything copied to clipboard

Documentation on the differences between the different models

Open eduardo4jesus opened this issue 2 years ago • 3 comments

Currently, there are three model type available.

I could not find any documentation on the difference between them. Is there any available? If not, could someone elaborate on that?

Many thanks.

eduardo4jesus avatar Apr 22 '23 15:04 eduardo4jesus

There is a paper accompanying the repository. The models are the same except for neural network size, B stands for "base" and is the smallest, L is "large" and H is "huge". The paper reports that the performance difference between L and H isn't much and I would recommend L if your machine supports it. However, B is lighter and not far behind in performance.

franchesoni avatar Apr 23 '23 14:04 franchesoni

@franchesoni, thank you so much. I added a PR #300 on this. I would appreciate to have your feedback.

eduardo4jesus avatar Apr 28 '23 02:04 eduardo4jesus

I've run extensive testing on the models using a wide variety of images. Here is a part of the print-log used when testing and a sample image (locally on my RTX3080):

vit_h Registering model... 12:48:03 Reading image... 12:48:08 Making masks... 12:48:08 Done at: 12:48:14 | Amount: 13 Making image from mask... 12:48:14 Done...? | 12:48:17 | Time taken: 10.918561458587646 vit_h-14-52-09-3 6000053882598877-13-masks vit_l Registering model... 12:48:17 Reading image... 12:48:19 Making masks... 12:48:19 Done at: 12:48:22 | Amount: 17 Making image from mask... 12:48:22 Done...? | 12:48:25 | Time taken: 5.4358086585998535 vit_l-14-55-36-3 4636905193328857-17-masks vit_b Registering model... 12:48:25 Reading image... 12:48:26 Making masks... 12:48:26 Done at: 12:48:28 | Amount: 10 Making image from mask... 12:48:28 Done...? | 12:48:31 | Time taken: 2.9744369983673096 vit_b-14-58-04-2 34617018699646-10-masks

I found that on average vit_l has the best performance/accuracy-tradeoff. vit_h is the most accurate but slowest, and vit_b the fastest but the least accurate.

pinksloyd avatar May 05 '23 01:05 pinksloyd