fld icon indicating copy to clipboard operation
fld copied to clipboard

CUDA out of memory

Open mapengsen opened this issue 1 year ago • 3 comments

When the dataset is larger,raise error:

Datasets files num is: 1344726 Datasets path is: /root/autodl-tmp/MDT/log_chekpoints/sampleImage/2024_05_10_9
Datasets files num is: 50000 Traceback (most recent call last):
File "evaluations/fld/eval_image.py", line 81, in main() File "evaluations/fld/eval_image.py", line 54, in main Precision_value = PrecisionRecall(mode="Precision").compute_metric(train_feat, None, gen_feat) # Default precision File "/root/autodl-tmp/MDT/evaluations/fld/fld/metrics/PrecisionRecall.py", line 57, in compute_metric return self.pct_in_manifold(gen_feat, train_feat).item() File "/root/autodl-tmp/MDT/evaluations/fld/fld/metrics/PrecisionRecall.py", line 33, in pct_in_manifold nn_dists = self.get_nn_dists(manifold_feat) File "/root/autodl-tmp/MDT/evaluations/fld/fld/metrics/PrecisionRecall.py", line 24, in get_nn_dists curr_dists = torch.cdist(feat[start:end], feat) File "/root/miniconda3/envs/MDT/lib/python3.8/site-packages/torch/functional.py", line 1315, in cdist return _VF.cdist(x1, x2, p, None) # type: ignore[attr-defined] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 50.10 GiB. GPU 0 has a total capacty of 23.70 GiB of which 996.56 MiB is free. Process 148558 has 22.72 GiB memory in use. Of the allocated memory 21.07 GiB is allocated by PyTorch, and 216.20 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

mapengsen avatar May 10 '24 08:05 mapengsen

For precision, the distance computation is batched for the gen_feat but not for the train_feat. Does it work if you take a subset of your train_feat?

marcojira avatar May 26 '24 11:05 marcojira

Now, why did I end up with recall being 0? Is this normal

mapengsen avatar Jun 03 '24 02:06 mapengsen

That would be unlikely unless your generated data has very low variance or is very out of distribution.

marcojira avatar Jun 03 '24 19:06 marcojira