Documentation on the differences between the different models
Currently, there are three model type available.
defaultorvit_h: ViT-H SAM model.vit_l: ViT-L SAM model.vit_b: ViT-B SAM model.
I could not find any documentation on the difference between them. Is there any available? If not, could someone elaborate on that?
Many thanks.
There is a paper accompanying the repository. The models are the same except for neural network size, B stands for "base" and is the smallest, L is "large" and H is "huge". The paper reports that the performance difference between L and H isn't much and I would recommend L if your machine supports it. However, B is lighter and not far behind in performance.
@franchesoni, thank you so much. I added a PR #300 on this. I would appreciate to have your feedback.
I've run extensive testing on the models using a wide variety of images. Here is a part of the print-log used when testing and a sample image (locally on my RTX3080):
vit_h Registering model... 12:48:03 Reading image... 12:48:08 Making masks... 12:48:08 Done at: 12:48:14 | Amount: 13 Making image from mask... 12:48:14 Done...? | 12:48:17 | Time taken: 10.918561458587646
vit_l Registering model... 12:48:17 Reading image... 12:48:19 Making masks... 12:48:19 Done at: 12:48:22 | Amount: 17 Making image from mask... 12:48:22 Done...? | 12:48:25 | Time taken: 5.4358086585998535
vit_b Registering model... 12:48:25 Reading image... 12:48:26 Making masks... 12:48:26 Done at: 12:48:28 | Amount: 10 Making image from mask... 12:48:28 Done...? | 12:48:31 | Time taken: 2.9744369983673096
I found that on average vit_l has the best performance/accuracy-tradeoff. vit_h is the most accurate but slowest, and vit_b the fastest but the least accurate.
vit_l
Registering model... 12:48:17
Reading image... 12:48:19
Making masks... 12:48:19
Done at: 12:48:22 | Amount:
17
Making image from mask... 12:48:22
Done...? | 12:48:25 | Time taken: 5.4358086585998535
vit_b
Registering model... 12:48:25
Reading image... 12:48:26
Making masks... 12:48:26
Done at: 12:48:28 | Amount:
10
Making image from mask... 12:48:28
Done...? | 12:48:31 | Time taken: 2.9744369983673096
