custom_d_fine
custom_d_fine copied to clipboard
D-FINE: SoTA Object Detection model custom training/exporting/inferencing pipeline
Custom SoTA Object Detection D-FINE model training, exporting, inferencing pipeline
This is a custom project to work with D-FINE - state of the art object detection transformer based model. Model author's repo: D-FINE.
Main scripts
To run the scripts, use the following commands:
python -m src.etl.preprocess # Converts images and PDFs to JPG format
python -m src.etl.split # Creates train, validation, and test CSVs with image paths
python -m src.dl.train # Runs the training pipeline
python -m src.dl.export # Exports weights in various formats after training
python -m src.dl.bench # Runs all exported models on the test set
python -m src.dl.infer # Runs model ontest folder, saves visualisations and txt preds
Note: if you don't pass any parameters, you can run any of these scripts with make script_name, for exmaple: make train will run python -m src.dl.train. You can also just run make to run all scripts one by one (excluding last, infer script)
Usage example
git clone https://github.com/ArgoHA/custom_d_fine.git- For bigger models (l, x) download from gdrive andput into
pretrainedfolder - Prepare your data:
imagesfolder andlabelsfolder (txt file per image in YOLO format). - Customize
config.yaml, minimal example:exp_name. This is experiment name which is used in model's output folder. After you train a model, you can run export/bench/infer and it will use the model under this name + current date.root. Path to the directory where you store your dataset and where model outputs will be saveddata_path. Path to the folder withimagesandlabelslabel_to_name. Your custom dataset classesmodel_name. Choose from n/s/m/l/x model sizes.- and usual things like: epochs, batch_size, num_workers. Check out config.yaml for all configs.
- Run
preprocessandsplitscripts from custom_d_fine repo. - Run
trainscript, changing confurations, iterating, untill you get desired results. - Run
exportscript to create ONNX, TensorRT, OpenVINO models.
If you run train script passing the args in the command and not changing them in the config file - you should also pass changed args to other scripts like export or infer. Example:
python -m src.dl.train exp_name=my_experiment
python -m src.dl.export exp_name=my_experiment
Exporting tips
Half precision:
- usually makes sense if your hardware was more FLOPs in fp16
- works best with TensorRT
- for Torch version, AMP is used when Half flag is true, but if FLOPs are the same for fp32 and fp16 - I see AMP being a little slower during inference.
- is not used for OpenVINO, as it automatically picks precision
Dynamic input means that during inference, we cut black paddings from letterbox. I don't recommend using it with D-FINE as accuracy degrades too much (probably because absolute Positional Encoding of pathces)
Inference
Use inference classes in src/infer. Currently available:
- Torch
- TensorRT
- OpenVINO
- ONNX
You can run inference on a folder (path_to_test_data) of images or on a folder of videos. Crops will be created automatically. You can control it and paddings from config.yaml in the infer section.
Outputs
- Models: Saved during the training process and export at
output/models/exp_name_date. Includes training logs, table with main metrics, confusion matrics, f1-score_vs_threshold and precisino_recall_vs_threshold. - Debug images: Preprocessed images (including augmentations) are saved at
output/debug_images/splitas they are fed into the model (except for normalization). - Evaluation predicts: Visualised model's predictions on val set. Includes GT as green and preds as blue.
- Bench images: Visualised model's predictions with inference class. Uses all exported models
- Infer: Visualised model's predictions and predicted annotations in yolo txt format
Results examples
Train

Benchmarking

WandB

Infer


Features
- Training pipeline from SoTA D-FINE model
- Export to ONNX, OpenVino, TensorRT.
- Inference class for Torch, TensorRT, OpenVINO on images or videos
- Label smoothing in Focal loss
- Augs based on the albumentations lib
- Mosaic augmentation, multiscale aug
- Metrics: mAPs, Precision, Recall, F1-score, Confusion matrix, IoU, plots
- After training is done - runs a test to calculate the optimal conf threshold
- Exponential moving average model
- Batch accumulation
- Automatic mixed precision (40% less vRAM used and 15% faster training)
- Gradient clipping
- Keep ratio of the image and use paddings or use simple resize
- When ratio is kept, inference can be sped up with removal of grey paddings
- Visualisation of preprocessed images, model predictions and ground truth
- Warmup epochs to ignore background images for easier start of convirsion
- OneCycler used as scheduler, AdamW as optimizer
- Unified configuration file for all scrips
- Annotations in YOLO format, splits in csv format
- ETA displayed during training, precise strating epoch 2
- Logging file with training process
- WandB integration
- Batch inference
- Early stopping
- Gradio UI demo
TODO
- Finetune with layers freeze
- Add support for cashing in dataset
- Add support for multi GPU training
- Instance segmentation
- Smart dataset preprocessing. Detect small objects. Detect near duplicates (remove from val/test)
Acknowledgement
This project is built upon original D-FINE repo. Thank you to the D-FINE team for an awesome model!
@misc{peng2024dfine,
title={D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement},
author={Yansong Peng and Hebei Li and Peixi Wu and Yueyi Zhang and Xiaoyan Sun and Feng Wu},
year={2024},
eprint={2410.13842},
archivePrefix={arXiv},
primaryClass={cs.CV}
}