trackformer CUDA out of memory when the sequence last too long

Instructions To Reproduce the 🐛 Bug:

what changes you made (git diff) or what code you wrote

None

what exact command you run: CUDA_VISIBLE_DEVICES=0 python src/track.py with reid dataset_name=DEMO data_root_dir=${IMG_PATH} output_dir=${OUTPUT_PATH} write_images=True frame_range.start=0 frame_range.end=1.0
what you observed (including full logs):

The cost memory of GPU increases as the program runs until an error "CUDA out of memory" is reported.

please simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset. My video sequence exceeds 10000 frames, and each frame contains about 10 people. I think the key to the problem is the increase in GPU memory cost.

Expected behavior:

The cost memory of GPU should be stables.

Environment:

Provide your environment information using the following command:

My enivorment is same as the INSTALL.md except the pytorch=1.5.1 and torchvision=0.6.1

May 06 '23 09:05 czhaneva

I encountered exactly the same bug.

Aug 09 '23 07:08 HenryZhou19

This might indeed be a bug. We never tested the codebase for sequences with that many frames. During inference all previous tracks are kept in the memory. For 10000 or infinite number of frames this will accumulate. One could try to move tracks that are already past the re-identification window to the CPU memory.

Aug 09 '23 08:08 timmeinhardt

Hi Tim, thank you very much for your time and attention. Here is what happened to me: When I tried to run the pre-training as TRAIN.md says:" python src/train.py with
crowdhuman
deformable
multi_frame
tracking
output_dir=models/crowdhuman_deformable_multi_frame
", the cost of GPU's memory kept increasing slowly until CUDA out of memory, and the training just failed.

Aug 09 '23 08:08 HenryZhou19

@HenryZhou19 This is not the same problem as mentioned in the first message of this issue. The original problem was a CUDA out-of-memory issue during inference not training.

Aug 09 '23 08:08 timmeinhardt

Sorry for that. But I hope my problem could help locating the bug.

Aug 09 '23 09:08 HenryZhou19

Hi @czhaneva Did you resolve this issue? I am facing the same problem.

Nov 02 '23 07:11 chamathabeysinghe

Hi Tim, thank you very much for your time and attention. Here is what happened to me: When I tried to run the pre-training as TRAIN.md says:" python src/train.py with crowdhuman deformable multi_frame tracking output_dir=models/crowdhuman_deformable_multi_frame ", the cost of GPU's memory kept increasing slowly until CUDA out of memory, and the training just failed.

Hi, I also met this problem, have you solved it? Could you please help me with it? @HenryZhou19 @timmeinhardt

Jan 23 '24 02:01 imzhangyd

trackformer trackformer copied to clipboard

CUDA out of memory when the sequence last too long

Instructions To Reproduce the 🐛 Bug:

Expected behavior:

Environment:

trackformer
trackformer copied to clipboard