trackformer
trackformer copied to clipboard
CUDA out of memory when the sequence last too long
Instructions To Reproduce the 🐛 Bug:
- what changes you made (
git diff
) or what code you wrote
None
- what exact command you run:
CUDA_VISIBLE_DEVICES=0 python src/track.py with reid dataset_name=DEMO data_root_dir=${IMG_PATH} output_dir=${OUTPUT_PATH} write_images=True frame_range.start=0 frame_range.end=1.0
- what you observed (including full logs):
The cost memory of GPU increases as the program runs until an error "CUDA out of memory" is reported.
- please simplify the steps as much as possible so they do not require additional resources to
run, such as a private dataset.
My video sequence exceeds 10000 frames, and each frame contains about 10 people. I think the key to the problem is the increase in GPU memory cost.
Expected behavior:
The cost memory of GPU should be stables.
Environment:
Provide your environment information using the following command:
My enivorment is same as the INSTALL.md except the pytorch=1.5.1 and torchvision=0.6.1
I encountered exactly the same bug.
This might indeed be a bug. We never tested the codebase for sequences with that many frames. During inference all previous tracks are kept in the memory. For 10000 or infinite number of frames this will accumulate. One could try to move tracks that are already past the re-identification window to the CPU memory.
Hi Tim, thank you very much for your time and attention.
Here is what happened to me:
When I tried to run the pre-training as TRAIN.md says:"
python src/train.py with
crowdhuman
deformable
multi_frame
tracking
output_dir=models/crowdhuman_deformable_multi_frame
",
the cost of GPU's memory kept increasing slowly until CUDA out of memory, and the training just failed.
@HenryZhou19 This is not the same problem as mentioned in the first message of this issue. The original problem was a CUDA out-of-memory issue during inference not training.
Sorry for that. But I hope my problem could help locating the bug.
Hi @czhaneva Did you resolve this issue? I am facing the same problem.
Hi Tim, thank you very much for your time and attention. Here is what happened to me: When I tried to run the pre-training as TRAIN.md says:" python src/train.py with crowdhuman deformable multi_frame tracking output_dir=models/crowdhuman_deformable_multi_frame ", the cost of GPU's memory kept increasing slowly until CUDA out of memory, and the training just failed.
Hi, I also met this problem, have you solved it? Could you please help me with it? @HenryZhou19 @timmeinhardt