sam3
sam3 copied to clipboard
Add the data serialization factor in COCO-JSON-LOADER
This is for the issue #307 . (Closes #307 )
When I finetuned the sam3 with a large dataset (like larger than 1M 2D images), I faced the memory leak problem in the dataloader.
Then, I followed this post: https://ppwwyyxx.com/blog/2022/Demystify-RAM-Usage-in-Multiprocess-DataLoader/, and it worked well (torch-serialization).
If we want to use the torch-serialization method, we can just adjust the coco_json_loader component in yaml file.
Please just add grouped_serialzation: true like this:
data:
train:
_target_: sam3.train.data.torch_dataset.TorchDataset
dataset:
_target_: sam3.train.data.sam3_image_dataset.Sam3ImageDataset
limit_ids: ${biomedseg2d_train.num_images}
transforms: ${biomedseg2d_train.train_transforms}
load_segmentation: ${scratch.enable_segmentation}
coco_json_loader:
_target_: sam3.train.data.coco_json_loaders.COCO_FROM_JSON
category_chunk_size: 2
grouped_serialzation: true # !!!
_partial_: true