GoalFlow
GoalFlow copied to clipboard
train loss about training_perception
Hi, thank you for sharing GoalFlow!
I'm training the perception model using run_goalflow_training_perception.sh,
but my training loss keeps oscillating and doesn't converge even after 40 epochs.
(Attached TensorBoard screenshot shows losses for agent box/class and BEV semantic.)
Environment:
- 4× V100 (32GB, NVIDIA DGX)
- Batch size: 15 (to fully use VRAM)
- Epochs: 40
- Other configs: default
Questions:
- Is this oscillating loss behavior expected, or am I missing any training setting (LR, warmup, grad clip, etc.)?
- Why does the provided script use
batch_size=2?
Does GoalFlow assume a specific global batch size or gradient accumulation setting?
Thanks for any advice!
- The open-source code is the complete version, and the parameters in the shell script are already configured.
- When I train on a 4090 GPU with batch_size=2, it already uses up all the VRAM. Could you try reducing the tf_d_model size and see if that helps?