where is the fine-tuned model/checkpoint/how to use in batch inference
Dear developers/community,
I tried to follow the instructions from issue 163 to fine-tune a model using: !python sam3/train/train.py -c configs/roboflow_v100/roboflow_v100_full_ft_100_images.yaml --use-cluster 0 --num-gpus 1
Now I was expecting to somewhere have a .pt checkpoint file somewhere, but cannot find it. The training did not produce any errors. I now would like to use the fine-tuned model for batch-inference on new images, but have no idea so far how to do that. Maybe somebody can get me jump-started (I am not really an experienced coder).
Thanks in advance and best Matthias
In my case it was caused by the config.
If skip_saving_ckpts: true is set in the training config, SAM3 will not write any checkpoints even though training runs normally. Setting it to false restores checkpoint saving.
For the maintainers: Happy to add a brief docs update based on the discussion in https://github.com/facebookresearch/sam3/issues/163. Just let me know if that would be helpful
Dear Mattia,
thanks a lot. Now, as you seem to be a vastly more knowledgeable, do you have any suggestion how I would have to go about using the fine-tuned .pt to do inference on a folder of images. Would really appreciate your help and advice.
Thanks and best Matthias
On Tue, Dec 2, 2025 at 1:12 AM Mattia Gaggi @.***> wrote:
mattiagaggi left a comment (facebookresearch/sam3#270) https://github.com/facebookresearch/sam3/issues/270#issuecomment-3599529568
In my case it was caused by the config.
If skip_saving_ckpts: true is set in the training config, SAM3 will not write any checkpoints even though training runs normally. Setting it to false restores checkpoint saving.
For the maintainers: Happy to add a brief docs update based on the discussion in #163 https://github.com/facebookresearch/sam3/issues/163. Just let me know if that would be helpful
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/sam3/issues/270#issuecomment-3599529568, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILKPUBPRDMDOR62RRNKWC337TKNJAVCNFSM6AAAAACNV2LI2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTKOJZGUZDSNJWHA . You are receiving this because you authored the thread.Message ID: @.***>
hey!
you should be able to do something like this when loading your model:
CHECKPOINT_PATH = "/path/to/your/fine_tuned_model.pt" model = build_sam3_image_model(bpe_path=bpe_path, checkpoint_path=CHECKPOINT_PATH, load_from_HF=False)
Dear Mattia,
you have been of quite some help so far, but unfortunately I ran into problems again. Maybe you have a suggestion: When I run: from sam3 import build_sam3_image_model
bpe_path = f"{sam3_root}/assets/bpe_simple_vocab_16e6.txt.gz" CHECKPOINT_PATH = "/home/matthias/sam3/runs/actions/checkpoints/ checkpoint.pt" model = build_sam3_image_model(bpe_path=bpe_path, checkpoint_path=CHECKPOINT_PATH, load_from_HF=False)
I get the following lengthy error (just showing the first lines here):
loaded /home/matthias/sam3/runs/actions/checkpoints/checkpoint.pt and found missing and/or unexpected keys: missing_keys=['backbone.vision_backbone.trunk.pos_embed', 'backbone.vision_backbone.trunk.patch_embed.proj.weight', 'backbone.vision_backbone.trunk.blocks.0.norm1.weight', 'backbone.vision_backbone.trunk.blocks.0.norm1.bias', ....
It keep going on like this.
Now, maybe I have done something wrong when fine-tuning with my own images. I had a couple of images labeled with bounding boxes using CVAT. The training with this command:
!python sam3/train/train.py -c configs/my_training.yaml --use-cluster 0 --num-gpus 1
went fine, no errors and a checkpoint file was created. However, using this checkpoint resulted in what I have written above.
There is an issue related to missing_keys, where a solution is offered:
model = build_sam3_image_model(enable_inst_interactivity=True, load_from_HF=False) # only build the model checkpoint = torch.load(checkpoint_path, map_location="cpu") model = sam3.train.utils.checkpoint_utils.load_state_dict_into_model(model=model, state_dict=checkpoint["model"])
This however, does not work for me, so I suspect I am doing something profoundly wrong.
Maybe you have an idea, which of course would be greatly appreciated. I am trying to use SAM3 for segmenting scientific images, namely electron-microscopic images, with the goal to segment particular organelles in these images. So far I have used Yolo to get BBs or OBBs, which works nicely, but in order to reconstruct these organelles from serial sections in 3D, I would need semantic or instant segmentation, i.e. masks in the end.
Thanks and best,
Matthias
On Tue, Dec 2, 2025 at 1:07 PM Mattia Gaggi @.***> wrote:
mattiagaggi left a comment (facebookresearch/sam3#270) https://github.com/facebookresearch/sam3/issues/270#issuecomment-3601704053
hey!
you should be able to do something like this when loading your model:
CHECKPOINT_PATH = "/path/to/your/fine_tuned_model.pt" model = build_sam3_image_model(bpe_path=bpe_path, checkpoint_path=CHECKPOINT_PATH, load_from_HF=False)
— Reply to this email directly, view it on GitHub https://github.com/facebookresearch/sam3/issues/270#issuecomment-3601704053, or unsubscribe https://github.com/notifications/unsubscribe-auth/AILKPUFUN2WTNVZUGIHCWF337V6GTAVCNFSM6AAAAACNV2LI2GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTMMBRG4YDIMBVGM . You are receiving this because you authored the thread.Message ID: @.***>
Dear Mattia,
you have been of quite some help so far, but unfortunately I ran into problems again. Maybe you have a suggestion: When I run: from sam3 import build_sam3_image_model
bpe_path = f"{sam3_root}/assets/bpe_simple_vocab_16e6.txt.gz" CHECKPOINT_PATH = "/home/matthias/sam3/runs/actions/checkpoints/ checkpoint.pt" model = build_sam3_image_model(bpe_path=bpe_path, checkpoint_path=CHECKPOINT_PATH, load_from_HF=False)
I get the following lengthy error (just showing the first lines here):
loaded /home/matthias/sam3/runs/actions/checkpoints/checkpoint.pt and found missing and/or unexpected keys: missing_keys=['backbone.vision_backbone.trunk.pos_embed', 'backbone.vision_backbone.trunk.patch_embed.proj.weight', 'backbone.vision_backbone.trunk.blocks.0.norm1.weight', 'backbone.vision_backbone.trunk.blocks.0.norm1.bias', ....
It keep going on like this.
Now, maybe I have done something wrong when fine-tuning with my own images. I had a couple of images labeled with bounding boxes using CVAT. The training with this command:
!python sam3/train/train.py -c configs/my_training.yaml --use-cluster 0 --num-gpus 1
went fine, no errors and a checkpoint file was created. However, using this checkpoint resulted in what I have written above.
There is an issue related to missing_keys, where a solution is offered:
model = build_sam3_image_model(enable_inst_interactivity=True, load_from_HF=False) # only build the model checkpoint = torch.load(checkpoint_path, map_location="cpu") model = sam3.train.utils.checkpoint_utils.load_state_dict_into_model(model=model, state_dict=checkpoint["model"])
This however, does not work for me, so I suspect I am doing something profoundly wrong.
Maybe you have an idea, which of course would be greatly appreciated. I am trying to use SAM3 for segmenting scientific images, namely electron-microscopic images, with the goal to segment particular organelles in these images. So far I have used Yolo to get BBs or OBBs, which works nicely, but in order to reconstruct these organelles from serial sections in 3D, I would need semantic or instant segmentation, i.e. masks in the end.
Thanks and best,
Matthias …
i have same your problem
i have same your problem
If you get a found missing and/or unexpected keys error, then that means the keys in the saved checkpoint don't match the original model. It could be that the checkpoint includes training-related data (e.g. the state of the optimizer or some config info) for example, or that you've used a different model config during training compared to when trying to load the checkpoint (the SAM3 code renames some of the weights at runtime, based on config).
The simplest thing to do would be to save the keys of the original model and the keys of your checkpoint file, then compare them. You can save the keys to a .json file using something like:
import torch
import json
# Change these!
path_to_weights = "/path/to/model.pt"
save_name = "model_keys.json"
weight_keys = list(torch.load(path_to_weights).keys())
with open(save_name, "w") as outfile:
json.dump(weight_keys, outfile, indent=2)
You'd want to run this twice, once to load and save keys for the original (sam3.pt) model and then again to save keys for your checkpoint. Then you can just read these in a text editor to see what's different between your checkpoint and the original model, which should make it easier to see what needs to be changed.
Claude Opus 4.5 did what I could not do by myself. The code below returns bounding boxes only, no segmentation. Anyway, for now I am quite happy with this solution. There is some funny formatting when I post this code cell. Everything, however, is in one cell of a jupyter notebook. Sorry about that.
from sam3. model_builder import build_sam3_image_model from sam3.model. sam3_image_processor import Sam3Processor from PIL import Image import torch import os import matplotlib.pyplot as plt import matplotlib.patches as patches import numpy as np
TF32 for Ampere GPUs (RTX 30xx/40xx, A100, etc.)
torch.backends. cuda.matmul.allow_tf32 = True torch.backends.cudnn.allow_tf32 = True
Use bfloat16 for less VRAM usage and faster inference
If you get errors, try torch.float16 instead
torch.autocast("cuda", dtype=torch. bfloat16).enter()
Disable gradient tracking for inference
torch.inference_mode().enter()
sam3_root = "path to sam3 root" bpe_path = f"{sam3_root}/assets/bpe_simple_vocab_16e6.txt.gz" CHECKPOINT_PATH = "path to checkpoint.pt"
model = build_sam3_image_model( bpe_path=bpe_path, checkpoint_path=None, load_from_HF=False, enable_segmentation=True )
checkpoint = torch.load(CHECKPOINT_PATH, map_location='cuda') result = model.load_state_dict(checkpoint['model'], strict=False)
print(f"Model loaded!") print(f" - Missing keys (segmentation head): {len(result.missing_keys)}") print(f" - Unexpected keys: {len(result.unexpected_keys)}")
processor = Sam3Processor(model)
image_dir = '/home/matthias/sam3/' image_files = [f for f in os.listdir(image_dir) if f.endswith(('.jpg', '.png', '.jpeg'))] images = [Image.open(os.path.join(image_dir, f)) for f in image_files]
prompts_per_image = ["whatever you are looking for"]
print(f"Found {len(images)} images: {image_files}") print(f"Will apply {len(prompts_per_image)} prompts to each image")
def batch_inference(images, prompts_per_image): results = []
with torch.no_grad():
for img_idx, image in enumerate(images):
print(f"Processing image {img_idx + 1}/{len(images)}...")
inference_state = processor. set_image(image)
image_results = {
"boxes": [],
"scores": [],
"prompts": []
}
for prompt in prompts_per_image:
output = processor.set_text_prompt(
state=inference_state,
prompt=prompt
)
if output.get("boxes") is not None:
image_results["boxes"]. append(output["boxes"])
if output.get("scores") is not None:
image_results["scores"].append(output["scores"])
image_results["prompts"].append(prompt)
results.append(image_results)
torch.cuda.empty_cache()
return results
def display_results(images, results, image_files): num_images = len(images)
cols = min(3, num_images)
rows = (num_images + cols - 1) // cols
fig, axes = plt.subplots(rows, cols, figsize=(18, 6 * rows))
if num_images == 1:
axes = [axes]
else:
axes = axes.flatten()
colors = ['red', 'blue', 'green', 'yellow', 'purple', 'orange', 'cyan', 'magenta']
for idx, (image, result, filename) in enumerate(zip(images, results, image_files)):
ax = axes[idx]
if image.mode == 'L' or image.mode == 'I' or image.mode == 'F':
ax.imshow(image, cmap='gray')
else:
ax.imshow(image)
ax.set_title(f"{filename}", fontsize=10)
ax.axis('off')
color_idx = 0
for prompt_idx, prompt in enumerate(result["prompts"]):
if prompt_idx < len(result["boxes"]) and result["boxes"][prompt_idx] is not None:
boxes = result["boxes"][prompt_idx]
if torch.is_tensor(boxes):
boxes = boxes.float(). cpu().numpy()
scores = None
if prompt_idx < len(result["scores"]) and result["scores"][prompt_idx] is not None:
scores = result["scores"][prompt_idx]
if torch.is_tensor(scores):
scores = scores. float().cpu().numpy()
for i, box in enumerate(boxes):
color = colors[color_idx % len(colors)]
color_idx += 1
x1, y1, x2, y2 = box[:4]
width = x2 - x1
height = y2 - y1
rect = patches.Rectangle(
(x1, y1), width, height,
linewidth=2,
edgecolor=color,
facecolor='none'
)
ax.add_patch(rect)
score_text = ""
if scores is not None and i < len(scores):
score_text = f": {scores[i]:.2f}"
ax.text(
x1, y1 - 5,
f"{prompt}{score_text}",
color=color,
fontsize=8,
fontweight='bold',
bbox=dict(boxstyle='round', facecolor='white', alpha=0.7)
)
for idx in range(num_images, len(axes)):
axes[idx].axis('off')
plt.tight_layout()
plt.show()
if len(images) > 0: results = batch_inference(images, prompts_per_image) print(f"Done! Processed {len(results)} images with {len(prompts_per_image)} prompts each.") display_results(images, results, image_files) else: print(f"No images found in directory: {image_dir}")
I was able to get my finetuned checkpoint working by adding 'detector.' prefix to model["model"] state dict.
import torch
from collections import OrderedDict
checkpoint = "checkpoint.pt"
wrapped_model = torch.load(checkpoint, map_location="cpu")
model = wrapped_model["model"]
new_state_dict = OrderedDict(("detector." + k, v) for k, v in model.items())
torch.save(new_state_dict, checkpoint.replace(".pt", "_converted.pt"))
Hope this helps someone who ran into the same issue.
@metalisai Thank you for your method, but I find after adding 'detector.',the original model has more keys than fine tuned model.
At the same time, model can be loaded normally,but the finely tuned model for inference can result in a lack of keys,The target was detected, but no corresponding mask was generated.:
This is my specific configuration file.
@metalisai I have sloved the question. Thank you!
@metalisai Now, SAM3 only has opened segmention task, so there are many layers lose, include manys “track*"'s layers. layers.https://github.com/facebookresearch/sam3/issues/270#issuecomment-3782909108
@SpiderJack0516 Yes, it's for segmentation finetune. Copying the missing keys from official model worked for me way back when I needed it.
import torch
from collections import OrderedDict
# alternative solution here: https://github.com/facebookresearch/sam3/issues/260
# (create empty SAM3 model and load the weights into it)
checkpoint = "checkpoint.pt"
interactive = True
wrapped_model = torch.load(checkpoint, map_location="cpu")
model = wrapped_model["model"]
new_state_dict = OrderedDict(("detector." + k, v) for k, v in model.items())
if interactive:
print("Loading interaction weights from official SAM3 model...")
meta_cp = "/home/<youruser>/.cache/huggingface/hub/models--facebook--sam3/snapshots/<snapshotidhere>/sam3.pt"
official_model = torch.load(meta_cp, map_location="cpu")
# copy interaction weights
for k, v in official_model.items():
if k.startswith("tracker.") or "sam2_convs" in k:
new_state_dict[k] = v
torch.save(new_state_dict, checkpoint.replace(".pt", "_converted.pt"))
@metalisai Thank you very much! Expect the SAM3 open the track.
Hi, I opened issue#434 on what I think might be related to the missing "detector" problem described by @metalisai and @SpiderJack0516. Has anyone found a clean way to fix the "Keys format" issue during a fine-tuning/training run? Editing the weights after fine-tuning seems like quiet a big inconvenience and might even pose the risk of corrupting/changing them in some other way. I would greatly appreciate your help! Thanks!
Has anyone found a clean way to fix the "Keys format" issue during a fine-tuning/training run?
The easiest thing would probably be to just undo the renaming the model does on load before saving the fine-tuned weights. For example, undoing the removal of the detector prefix, or converting back to the tracker prefix, if that's how the model is being loaded for training.
Though if the model used in fine-tuning isn't even loading the tracker components, then you might be forced to copy the original keys/weights back into the checkpoint (as above), though again maybe this is more convenient to do as part of saving the fine-tuned result.