Training Fails with AttributeError: 'list' object has no attribute 'popitem' on Custom COCO Dataset
I am trying to train SAM3 on a custom segmentation dataset. The dataset was annotated using CVAT and exported in COCO format. After exporting, I updated the COCO annotations to include a noun_phrase field. Below is an example of an annotation:
"annotations": [
{
"id": 13798,
"image_id": 2042,
"category_id": 1,
"segmentation": [
[
176.72,
..........
600.0
]
],
"area": 2694.0,
"bbox": [
176.72,
407.03,
122.28,
192.97
],
"iscrowd": 0,
"attributes": {
"occluded": false
},
"noun_phrase": "line_mark"
}
]
I then updated the roboflow_v100_full_ft_100_images.yaml configuration file for segmentation training, based on information from the following GitHub issues:
- https://github.com/facebookresearch/sam3/issues/163
- https://github.com/facebookresearch/sam3/issues/324
I am training the model using the following command:
python sam3/train/train.py \
-c configs/roboflow_v100/roboflow_v100_full_ft_100_images.yaml \
--use-cluster 0 \
--num-gpus 1
However, I encounter an error during training when the data loader attempts to retrieve values from a batch. Specifically, the training fails with the following exception:
AttributeError: 'list' object has no attribute 'popitem'
This occurs inside the training loop when the batch is being processed. I have not seen anyone else report the same issue, so I suspect there may be an error in my dataset preparation or in how the batch data is structured. Any help or guidance would be greatly appreciated.
My resolved configuration YAML is included at the bottom.
RuntimeError: DataLoader worker (pid 2148617) is killed by signal: Aborted.
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/wakasu/code/sam3/sam3/train/train.py", line 339, in <module>
[rank0]: main(args)
[rank0]: File "/home/wakasu/code/sam3/sam3/train/train.py", line 310, in main
[rank0]: single_node_runner(cfg, main_port)
[rank0]: File "/home/wakasu/code/sam3/sam3/train/train.py", line 71, in single_node_runner
[rank0]: single_proc_run(local_rank=0, main_port=main_port, cfg=cfg, world_size=num_proc)
[rank0]: File "/home/wakasu/code/sam3/sam3/train/train.py", line 58, in single_proc_run
[rank0]: trainer.run()
[rank0]: File "/home/wakasu/code/sam3/sam3/train/trainer.py", line 571, in run
[rank0]: self.run_train()
[rank0]: File "/home/wakasu/code/sam3/sam3/train/trainer.py", line 592, in run_train
[rank0]: outs = self.train_epoch(dataloader)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/home/wakasu/code/sam3/sam3/train/trainer.py", line 813, in train_epoch
[rank0]: self._run_step(batch, phase, loss_mts, extra_loss_mts)
[rank0]: File "/home/wakasu/code/sam3/sam3/train/trainer.py", line 950, in _run_step
[rank0]: loss_dict, batch_size, extra_losses = self._step(
[rank0]: ^^^^^^^^^^^
[rank0]: File "/home/wakasu/code/sam3/sam3/train/trainer.py", line 500, in _step
[rank0]: key, batch = batch.popitem()
[rank0]: ^^^^^^^^^^^^^
[rank0]: AttributeError: 'list' object has no attribute 'popitem'
paths:
roboflow_vl_100_root: datasets/coco_dataset/sam
experiment_log_dir: datasets/coco_dataset/exp
bpe_path: sam3/assets/bpe_simple_vocab_16e6.txt.gz
roboflow_train:
num_images: null
train_transforms:
- _target_: sam3.train.transforms.basic_for_api.ComposeAPI
transforms:
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterCrowds
- _target_: sam3.train.transforms.point_sampling.RandomizeInputBbox
box_noise_std: 0.1
box_noise_max: 20
- _target_: sam3.train.transforms.segmentation.DecodeRle
- _target_: sam3.train.transforms.basic_for_api.RandomResizeAPI
sizes:
_target_: sam3.train.transforms.basic.get_random_resize_scales
size: 600
min_size: 480
rounded: false
max_size:
_target_: sam3.train.transforms.basic.get_random_resize_max_size
size: 600
square: true
consistent_transform: false
- _target_: sam3.train.transforms.basic_for_api.PadToSizeAPI
size: 600
consistent_transform: false
- _target_: sam3.train.transforms.basic_for_api.ToTensorAPI
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterEmptyTargets
- _target_: sam3.train.transforms.basic_for_api.NormalizeAPI
mean:
- 0.5
- 0.5
- 0.5
std:
- 0.5
- 0.5
- 0.5
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterEmptyTargets
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterFindQueriesWithTooManyOut
max_num_objects: 200
val_transforms:
- _target_: sam3.train.transforms.basic_for_api.ComposeAPI
transforms:
- _target_: sam3.train.transforms.segmentation.DecodeRle
- _target_: sam3.train.transforms.basic_for_api.RandomResizeAPI
sizes: 600
max_size:
_target_: sam3.train.transforms.basic.get_random_resize_max_size
size: 600
square: true
consistent_transform: false
- _target_: sam3.train.transforms.basic_for_api.ToTensorAPI
- _target_: sam3.train.transforms.basic_for_api.NormalizeAPI
mean:
- 0.5
- 0.5
- 0.5
std:
- 0.5
- 0.5
- 0.5
loss:
_target_: sam3.train.loss.sam3_loss.Sam3LossWrapper
matcher:
_target_: sam3.train.matcher.BinaryHungarianMatcherV2
focal: true
cost_class: 2.0
cost_bbox: 5.0
cost_giou: 2.0
alpha: 0.25
gamma: 2
stable: false
o2m_weight: 2.0
o2m_matcher:
_target_: sam3.train.matcher.BinaryOneToManyMatcher
alpha: 0.3
threshold: 0.4
topk: 4
use_o2m_matcher_on_o2m_aux: false
loss_fns_find:
- _target_: sam3.train.loss.loss_fns.Boxes
weight_dict:
loss_bbox: 5.0
loss_giou: 2.0
- _target_: sam3.train.loss.loss_fns.IABCEMdetr
weak_loss: false
weight_dict:
loss_ce: 20.0
presence_loss: 20.0
pos_weight: 10.0
alpha: 0.25
gamma: 2
use_presence: true
pos_focal: false
pad_n_queries: 200
pad_scale_pos: 1.0
- _target_: sam3.train.loss.loss_fns.Masks
focal_alpha: 0.25
focal_gamma: 2.0
weight_dict:
loss_mask: 200.0
loss_dice: 10.0
compute_aux: false
loss_fn_semantic_seg:
_target_: sam3.train.loss.loss_fns.SemanticSegCriterion
presence_head: true
presence_loss: false
focal: true
focal_alpha: 0.6
focal_gamma: 2.0
downsample: false
weight_dict:
loss_semantic_seg: 20.0
loss_semantic_presence: 1.0
loss_semantic_dice: 30.0
scale_by_find_batch_size: true
scratch:
enable_segmentation: true
d_model: 256
pos_embed:
_target_: sam3.model.position_encoding.PositionEmbeddingSine
num_pos_feats: 256
normalize: true
scale: null
temperature: 10000
use_presence_eval: true
original_box_postprocessor:
_target_: sam3.eval.postprocessors.PostProcessImage
max_dets_per_img: -1
use_original_ids: true
use_original_sizes_box: true
use_presence: true
matcher:
_target_: sam3.train.matcher.BinaryHungarianMatcherV2
focal: true
cost_class: 2.0
cost_bbox: 5.0
cost_giou: 2.0
alpha: 0.25
gamma: 2
stable: false
scale_by_find_batch_size: true
resolution: 600
consistent_transform: false
max_ann_per_img: 200
train_norm_mean:
- 0.5
- 0.5
- 0.5
train_norm_std:
- 0.5
- 0.5
- 0.5
val_norm_mean:
- 0.5
- 0.5
- 0.5
val_norm_std:
- 0.5
- 0.5
- 0.5
num_train_workers: 10
num_val_workers: 0
max_data_epochs: 20
target_epoch_size: 1500
hybrid_repeats: 1
context_length: 2
gather_pred_via_filesys: false
lr_scale: 0.1
lr_transformer: 8.0e-05
lr_vision_backbone: 2.5e-05
lr_language_backbone: 5.0e-06
lrd_vision_backbone: 0.9
wd: 0.1
scheduler_timescale: 20
scheduler_warmup: 20
scheduler_cooldown: 20
val_batch_size: 1
collate_fn_val:
_target_: sam3.train.data.collator.collate_fn_api
_partial_: true
repeats: 1
dict_key: roboflow100
with_seg_masks: true
gradient_accumulation_steps: 1
train_batch_size: 1
collate_fn:
_target_: sam3.train.data.collator.collate_fn_api_with_chunking
_partial_: true
repeats: 1
dict_key: all
with_seg_masks: true
num_chunks: 1
trainer:
_target_: sam3.train.trainer.Trainer
skip_saving_ckpts: false
empty_gpu_mem_cache_after_eval: true
skip_first_val: true
max_epochs: 20
accelerator: cuda
seed_value: 123
val_epoch_freq: 1
mode: train
gradient_accumulation_steps: 1
distributed:
backend: nccl
find_unused_parameters: true
gradient_as_bucket_view: true
loss:
all:
_target_: sam3.train.loss.sam3_loss.Sam3LossWrapper
matcher:
_target_: sam3.train.matcher.BinaryHungarianMatcherV2
focal: true
cost_class: 2.0
cost_bbox: 5.0
cost_giou: 2.0
alpha: 0.25
gamma: 2
stable: false
o2m_weight: 2.0
o2m_matcher:
_target_: sam3.train.matcher.BinaryOneToManyMatcher
alpha: 0.3
threshold: 0.4
topk: 4
use_o2m_matcher_on_o2m_aux: false
loss_fns_find:
- _target_: sam3.train.loss.loss_fns.Boxes
weight_dict:
loss_bbox: 5.0
loss_giou: 2.0
- _target_: sam3.train.loss.loss_fns.IABCEMdetr
weak_loss: false
weight_dict:
loss_ce: 20.0
presence_loss: 20.0
pos_weight: 10.0
alpha: 0.25
gamma: 2
use_presence: true
pos_focal: false
pad_n_queries: 200
pad_scale_pos: 1.0
- _target_: sam3.train.loss.loss_fns.Masks
focal_alpha: 0.25
focal_gamma: 2.0
weight_dict:
loss_mask: 200.0
loss_dice: 10.0
compute_aux: false
loss_fn_semantic_seg:
_target_: sam3.train.loss.loss_fns.SemanticSegCriterion
presence_head: true
presence_loss: false
focal: true
focal_alpha: 0.6
focal_gamma: 2.0
downsample: false
weight_dict:
loss_semantic_seg: 20.0
loss_semantic_presence: 1.0
loss_semantic_dice: 30.0
scale_by_find_batch_size: true
default:
_target_: sam3.train.loss.sam3_loss.DummyLoss
data:
train:
_target_: sam3.train.data.torch_dataset.TorchDataset
dataset:
_target_: sam3.train.data.sam3_image_dataset.Sam3ImageDataset
limit_ids: null
transforms:
- _target_: sam3.train.transforms.basic_for_api.ComposeAPI
transforms:
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterCrowds
- _target_: sam3.train.transforms.point_sampling.RandomizeInputBbox
box_noise_std: 0.1
box_noise_max: 20
- _target_: sam3.train.transforms.segmentation.DecodeRle
- _target_: sam3.train.transforms.basic_for_api.RandomResizeAPI
sizes:
_target_: sam3.train.transforms.basic.get_random_resize_scales
size: 600
min_size: 480
rounded: false
max_size:
_target_: sam3.train.transforms.basic.get_random_resize_max_size
size: 600
square: true
consistent_transform: false
- _target_: sam3.train.transforms.basic_for_api.PadToSizeAPI
size: 600
consistent_transform: false
- _target_: sam3.train.transforms.basic_for_api.ToTensorAPI
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterEmptyTargets
- _target_: sam3.train.transforms.basic_for_api.NormalizeAPI
mean:
- 0.5
- 0.5
- 0.5
std:
- 0.5
- 0.5
- 0.5
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterEmptyTargets
- _target_: sam3.train.transforms.filter_query_transforms.FlexibleFilterFindGetQueries
query_filter:
_target_: sam3.train.transforms.filter_query_transforms.FilterFindQueriesWithTooManyOut
max_num_objects: 200
load_segmentation: true
max_ann_per_img: 500000
multiplier: 1
max_train_queries: 50000
max_val_queries: 50000
training: true
use_caching: false
img_folder: datasets/coco_dataset/sam/train/
ann_file: datasets/coco_dataset/sam/train/_annotations.coco.json
shuffle: true
batch_size: 1
num_workers: 10
pin_memory: true
drop_last: true
collate_fn:
_target_: sam3.train.data.collator.collate_fn_api_with_chunking
_partial_: true
repeats: 1
dict_key: all
with_seg_masks: true
num_chunks: 1
val:
_target_: sam3.train.data.torch_dataset.TorchDataset
dataset:
_target_: sam3.train.data.sam3_image_dataset.Sam3ImageDataset
load_segmentation: true
coco_json_loader:
_target_: sam3.train.data.coco_json_loaders.COCO_FROM_JSON
include_negatives: true
category_chunk_size: 2
_partial_: true
img_folder: datasets/coco_dataset/sam/val/
ann_file: datasets/coco_dataset/sam/val/_annotations.coco.json
transforms:
- _target_: sam3.train.transforms.basic_for_api.ComposeAPI
transforms:
- _target_: sam3.train.transforms.segmentation.DecodeRle
- _target_: sam3.train.transforms.basic_for_api.RandomResizeAPI
sizes: 600
max_size:
_target_: sam3.train.transforms.basic.get_random_resize_max_size
size: 600
square: true
consistent_transform: false
- _target_: sam3.train.transforms.basic_for_api.ToTensorAPI
- _target_: sam3.train.transforms.basic_for_api.NormalizeAPI
mean:
- 0.5
- 0.5
- 0.5
std:
- 0.5
- 0.5
- 0.5
max_ann_per_img: 100000
multiplier: 1
training: false
shuffle: false
batch_size: 1
num_workers: 0
pin_memory: true
drop_last: false
collate_fn:
_target_: sam3.train.data.collator.collate_fn_api
_partial_: true
repeats: 1
dict_key: roboflow100
with_seg_masks: true
model:
_target_: sam3.model_builder.build_sam3_image_model
bpe_path: sam3/assets/bpe_simple_vocab_16e6.txt.gz
device: cpus
eval_mode: false
enable_segmentation: true
checkpoint_path: /home/wakasu/.cache/huggingface/hub//models--facebook--sam3/snapshots/3c879f39826c281e95690f02c7821c4de09afae7/sam3.pt
meters:
val:
roboflow100:
detection:
_target_: sam3.eval.coco_writer.PredictionDumper
iou_type: bbox
dump_dir: datasets/coco_dataset/exp/dumps
merge_predictions: true
postprocessor:
_target_: sam3.eval.postprocessors.PostProcessImage
max_dets_per_img: -1
use_original_ids: true
use_original_sizes_box: true
use_presence: true
gather_pred_via_filesys: false
maxdets: 100
pred_file_evaluators:
- _target_: sam3.eval.coco_eval_offline.CocoEvaluatorOfflineWithPredFileEvaluators
gt_path: datasets/coco_dataset/sam/val/_annotations.coco.json
tide: false
iou_type: bbox
optim:
amp:
enabled: true
amp_dtype: bfloat16
optimizer:
_target_: torch.optim.AdamW
gradient_clip:
_target_: sam3.train.optim.optimizer.GradientClipper
max_norm: 0.1
norm_type: 2
param_group_modifiers:
- _target_: sam3.train.optim.optimizer.layer_decay_param_modifier
_partial_: true
layer_decay_value: 0.9
apply_to: backbone.vision_backbone.trunk
overrides:
- pattern: '*pos_embed*'
value: 1.0
options:
lr:
- scheduler:
_target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr: 8.0e-05
timescale: 20
warmup_steps: 20
cooldown_steps: 20
- scheduler:
_target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr: 2.5e-05
timescale: 20
warmup_steps: 20
cooldown_steps: 20
param_names:
- backbone.vision_backbone.*
- scheduler:
_target_: sam3.train.optim.schedulers.InverseSquareRootParamScheduler
base_lr: 5.0e-06
timescale: 20
warmup_steps: 20
cooldown_steps: 20
param_names:
- backbone.language_backbone.*
weight_decay:
- scheduler:
_target_: fvcore.common.param_scheduler.ConstantParamScheduler
value: 0.1
- scheduler:
_target_: fvcore.common.param_scheduler.ConstantParamScheduler
value: 0.0
param_names:
- '*bias*'
module_cls_names:
- torch.nn.LayerNorm
checkpoint:
save_dir: datasets/coco_dataset/exp/checkpoints
save_freq: 0
logging:
tensorboard_writer:
_target_: sam3.train.utils.logger.make_tensorboard_logger
log_dir: datasets/coco_dataset/exp/tensorboard
flush_secs: 120
should_log: true
wandb_writer: null
log_dir: datasets/coco_dataset/exp/logs
log_freq: 10
launcher:
num_nodes: 1
gpus_per_node: 2
experiment_log_dir: datasets/coco_dataset/exp
multiprocessing_context: forkserver
submitit:
account: null
partition: null
qos: null
timeout_hour: 72
use_cluster: true
cpus_per_task: 10
port_range:
- 10000
- 65000
constraint: null