GoalFlow icon indicating copy to clipboard operation
GoalFlow copied to clipboard

train loss about training_perception

Open Ahnhojin1223 opened this issue 2 months ago • 1 comments

Hi, thank you for sharing GoalFlow!

I'm training the perception model using run_goalflow_training_perception.sh,
but my training loss keeps oscillating and doesn't converge even after 40 epochs.
(Attached TensorBoard screenshot shows losses for agent box/class and BEV semantic.) Image

Environment:

  • 4× V100 (32GB, NVIDIA DGX)
  • Batch size: 15 (to fully use VRAM)
  • Epochs: 40
  • Other configs: default

Questions:

  1. Is this oscillating loss behavior expected, or am I missing any training setting (LR, warmup, grad clip, etc.)?
  2. Why does the provided script use batch_size=2?
    Does GoalFlow assume a specific global batch size or gradient accumulation setting?

Thanks for any advice!

Ahnhojin1223 avatar Oct 23 '25 08:10 Ahnhojin1223

  1. The open-source code is the complete version, and the parameters in the shell script are already configured.
  2. When I train on a 4090 GPU with batch_size=2, it already uses up all the VRAM. Could you try reducing the tf_d_model size and see if that helps?

ZebinX avatar Nov 26 '25 02:11 ZebinX