DeFRCN icon indicating copy to clipboard operation
DeFRCN copied to clipboard

OutOfMemoryError with PrototypicalCalibrationBlock

Open gladdduck opened this issue 10 months ago • 5 comments

Hello, when I train my dataset using DeFRCN, I encountered an issue. The base training process goes smoothly, but when I attempt K-shot finetuning, I keep getting an OutOfMemoryError.

I tried to solve it and found that when setting PCB_ENABLE to False, this issue doesn't occur.

However, when PCB_ENABLE is set to True, even if I adjust IMS_PER_BATCH to 1 on A100-40G, I still encounter the OutOfMemoryError.

Has anyone else experienced a similar issue? How was it resolved?

gladdduck avatar Apr 10 '24 13:04 gladdduck

Solution is to locate the PCB module :/path/defrcn/defrcn/evaluation/calibration_layer py build_prototypes function in the code: 'All_feature.append (feature.cpu ().data)' adds the following code: features =None.

cnjhh avatar Apr 11 '24 08:04 cnjhh

Solution is to locate the PCB module :/path/defrcn/defrcn/evaluation/calibration_layer py build_prototypes function in the code: 'All_feature.append (feature.cpu ().data)' adds the following code: features =None.

thanks for your reply! this works!

gladdduck avatar Apr 12 '24 05:04 gladdduck

Solution is to locate the PCB module :/path/defrcn/defrcn/evaluation/calibration_layer py build_prototypes function in the code: 'All_feature.append (feature.cpu ().data)' adds the following code: features =None.

However, this error still occurs from time to time. the code locate in calibration_layer py build_prototypes function features = self.extract_roi_features(img, boxes) extract_roi_features function conv_feature = self.imagenet_model(images.tensor[:, [2, 1, 0]])[ I'm very confused about this, even though I used gc.collect() and torch.cuda.empty_cache()

gladdduck avatar Apr 19 '24 05:04 gladdduck

features = self.extract_roi_features(img, boxes) boxes = None img = None all_features.append(features.cpu().data) features = None

features create by you customs datasets for novel classes ,you can solve this by reducing the number of novel classes, or generate features offline, instead of loading the novel datas to train it when the model validated, save it through the pickle module, and then modify the code to load the offline trained one directly during validation

cnjhh avatar Apr 19 '24 08:04 cnjhh

The device I use is A800 80G, and the novel data I set is 10 shot 13class, and when the model is loaded with the pcb module, the video memory occupies 53G, and 80G is not enough before the modification

cnjhh avatar Apr 19 '24 08:04 cnjhh