Memory allocation problem
Hello, sorry to bother you, I am running a nuclear data set with maskdino, but my problem now is insufficient memory, my bathsize is changed to 2, numworkers is changed to 0, and I started running, but the efficiency is too slow, numworkers will report memory allocation failure even if it is changed to 1. I have two a6000 graphics cards, but they cannot be distributed and used at the same time, otherwise the memory can not be allocated. I would like to ask you which parameters should be modified to reduce the use of memory.
This is my data set information
[10/18 11:09:21] d2.data.datasets.coco INFO: Loading /share/home/ncu10/Code/AI/Point_label/PointWSSIS/cell_data_root/coco/annotations/instances_train2017.json takes 2.70 seconds. [10/18 11:09:21] d2.data.datasets.coco INFO: Loaded 432 images in COCO format from /share/home/ncu10/Code/AI/Point_label/PointWSSIS/cell_data_root/coco/annotations/instances_train2017.json [10/18 11:09:21] d2.data.build INFO: Removed 0 images with no usable annotations. 432 images left. [10/18 11:09:21] d2.data.build INFO: Distribution of instances among all 80 categories: [36m| category | #instances | category | #instances | category | #instances | |:-------------:|:-------------|:------------:|:-------------|:-------------:|:-------------| | person | 17073 | bicycle | 0 | car | 0 | | motorcycle | 0 | airplane | 0 | bus | 0 | | train | 0 | truck | 0 | boat | 0 | | traffic light | 0 | fire hydrant | 0 | stop sign | 0 | | parking meter | 0 | bench | 0 | bird | 0 | | cat | 0 | dog | 0 | horse | 0 | | sheep | 0 | cow | 0 | elephant | 0 | | bear | 0 | zebra | 0 | giraffe | 0 | | backpack | 0 | umbrella | 0 | handbag | 0 | | tie | 0 | suitcase | 0 | frisbee | 0 | | skis | 0 | snowboard | 0 | sports ball | 0 | | kite | 0 | baseball bat | 0 | baseball gl.. | 0 | | skateboard | 0 | surfboard | 0 | tennis racket | 0 | | bottle | 0 | wine glass | 0 | cup | 0 | | fork | 0 | knife | 0 | spoon | 0 | | bowl | 0 | banana | 0 | apple | 0 | | sandwich | 0 | orange | 0 | broccoli | 0 | | carrot | 0 | hot dog | 0 | pizza | 0 | | donut | 0 | cake | 0 | chair | 0 | | couch | 0 | potted plant | 0 | bed | 0 | | dining table | 0 | toilet | 0 | tv | 0 | | laptop | 0 | mouse | 0 | remote | 0 | | keyboard | 0 | cell phone | 0 | microwave | 0 | | oven | 0 | toaster | 0 | sink | 0 | | refrigerator | 0 | book | 0 | clock | 0 | | vase | 0 | scissors | 0 | teddy bear | 0 | | hair drier | 0 | toothbrush | 0 | | | | total | 17073 | | | | |[0m [10/18 11:09:21] d2.data.build INFO: Using training sampler TrainingSampler [10/18 11:09:21] d2.data.common INFO: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'> [10/18 11:09:21] d2.data.common INFO: Serializing 432 elements to byte tensors and concatenating them all ... [10/18 11:09:22] d2.data.common INFO: Serialized dataset takes 28.01 MiB
Using the resnet50 model [10/18 11:09:13] detectron2 INFO: Rank of current process: 0. World size: 1 [10/18 11:09:14] detectron2 INFO: Environment info:
sys.platform linux
Python 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
numpy 1.24.4
detectron2 0.6 @/share/home/ncu10/Code/AI/Point_label/MaskDINO/detectron2/detectron2
Compiler GCC 9.4
CUDA compiler CUDA 11.4
detectron2 arch flags 8.6
DETECTRON2_ENV_MODULE
CUDA_VISIBLE_DEVICES=1 python train_net.py --num-gpus 1 --config-file /share/home/ncu10/Code/AI/Point_label/MaskDINO/configs/coco/instance-segmentation/maskdino_R50_bs16_50ep_3s.yaml MODEL.WEIGHTS /share/home/ncu10/Code/AI/Point_label/MaskDINO/model_file/maskdino_r50_50ep_300q_hid1024_3sd1_instance_maskenhanced_mask46.1ap_box51.5ap.pth
same error
Sorry for the late reply. How much memory do you need in our case? We use about 30G for Resnet50 batch size 4.