rf-detr icon indicating copy to clipboard operation
rf-detr copied to clipboard

Questions about Multi-GPU training

Open QinSun197 opened this issue 5 months ago • 3 comments

Search before asking

  • [x] I have searched the RF-DETR issues and found no similar bug report.

Bug

why i can not train rfdetr on multi-GPUs with the scripts in readme python -m torch.distributed.launch --nproc_per_node=3 --use_env main.py Traceback (most recent call last): File "/rf-detr/main.py", line 560, in <module> parser = argparse.ArgumentParser('LWDETR training and evaluation script', parents=[get_args_parser()]) NameError: name 'get_args_parser' is not defined parser = argparse.ArgumentParser('LWDETR training and evaluation script', parents=[get_args_parser()]) NameError: name 'get_args_parser' is not defined parser = argparse.ArgumentParser('LWDETR training and evaluation script', parents=[get_args_parser()]) NameError: name 'get_args_parser' is not defined

Environment

  • OS: Ubuntu 20.04
  • Python: 3.9.0
  • PyTorch: 2.2.0
  • GPU: A30

Minimal Reproducible Example

Image

Additional

No response

Are you willing to submit a PR?

  • [ ] Yes, I'd like to help by submitting a PR!

QinSun197 avatar Jul 21 '25 07:07 QinSun197

You should first create a main.py script that initializes your model and calls .train() as usual than run it in terminal. like this. hope it works


model = RFDETRBase()

model.train(dataset_dir=<DATASET_PATH>, epochs=10, batch_size=4, grad_accum_steps=4, lr=1e-4, output_dir=<OUTPUT_PATH>)```

capsule2077 avatar Jul 22 '25 07:07 capsule2077

Yeah... I think this is a bug, and I now just forget the main.py

GoDiao avatar Sep 06 '25 07:09 GoDiao

The main.py is not meant to be used directly.

isaacrob-roboflow avatar Sep 08 '25 14:09 isaacrob-roboflow