trackformer icon indicating copy to clipboard operation
trackformer copied to clipboard

CUDA out of memory when the sequence last too long

Open czhaneva opened this issue 1 year ago • 7 comments

Instructions To Reproduce the 🐛 Bug:

  1. what changes you made (git diff) or what code you wrote
None
  1. what exact command you run: CUDA_VISIBLE_DEVICES=0 python src/track.py with reid dataset_name=DEMO data_root_dir=${IMG_PATH} output_dir=${OUTPUT_PATH} write_images=True frame_range.start=0 frame_range.end=1.0
  2. what you observed (including full logs):
The cost memory of GPU increases as the program runs until an error "CUDA out of memory" is reported.
  1. please simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset. My video sequence exceeds 10000 frames, and each frame contains about 10 people. I think the key to the problem is the increase in GPU memory cost.

Expected behavior:

The cost memory of GPU should be stables.

Environment:

Provide your environment information using the following command:

My enivorment is same as the INSTALL.md except the pytorch=1.5.1 and torchvision=0.6.1

czhaneva avatar May 06 '23 09:05 czhaneva

I encountered exactly the same bug.

HenryZhou19 avatar Aug 09 '23 07:08 HenryZhou19

This might indeed be a bug. We never tested the codebase for sequences with that many frames. During inference all previous tracks are kept in the memory. For 10000 or infinite number of frames this will accumulate. One could try to move tracks that are already past the re-identification window to the CPU memory.

timmeinhardt avatar Aug 09 '23 08:08 timmeinhardt

Hi Tim, thank you very much for your time and attention. Here is what happened to me: When I tried to run the pre-training as TRAIN.md says:" python src/train.py with
crowdhuman
deformable
multi_frame
tracking
output_dir=models/crowdhuman_deformable_multi_frame
", the cost of GPU's memory kept increasing slowly until CUDA out of memory, and the training just failed.

HenryZhou19 avatar Aug 09 '23 08:08 HenryZhou19

@HenryZhou19 This is not the same problem as mentioned in the first message of this issue. The original problem was a CUDA out-of-memory issue during inference not training.

timmeinhardt avatar Aug 09 '23 08:08 timmeinhardt

Sorry for that. But I hope my problem could help locating the bug.

HenryZhou19 avatar Aug 09 '23 09:08 HenryZhou19

Hi @czhaneva Did you resolve this issue? I am facing the same problem.

chamathabeysinghe avatar Nov 02 '23 07:11 chamathabeysinghe

Hi Tim, thank you very much for your time and attention. Here is what happened to me: When I tried to run the pre-training as TRAIN.md says:" python src/train.py with crowdhuman deformable multi_frame tracking output_dir=models/crowdhuman_deformable_multi_frame ", the cost of GPU's memory kept increasing slowly until CUDA out of memory, and the training just failed.

Hi, I also met this problem, have you solved it? Could you please help me with it? @HenryZhou19 @timmeinhardt

imzhangyd avatar Jan 23 '24 02:01 imzhangyd