BiRefNet Guidance on Training from Scratch or Fine-Tuning

Hi, I would like to ask how many images you would recommend for training a model from scratch, and what weights you would suggest starting with.

My use case is object segmentation on plain backgrounds. The general model currently works quite well for most cases, but there are a few specific scenarios that could be improved. This is why I’m considering training or fine-tuning.

I have a dataset of around 7,000 images at 2K resolution. What would you recommend in this case?

Thank you in advance for your help!

Jan 15 '25 08:01 alvarofsan

For common cases with no extremely complicated shapes, 500-1,000 images should be enough for training from scratch. If your cases are very different from the training sets I used to train the general version weights, I suggest training from scratch when you have enough images. Otherwise, fine-tuning could be a better way.

In your case, I recommend training from scratch. BTW, you can check the model efficiency part in README; use FP16 + compile==True + PyTorch==2.5.1 to try to save GPU memory to do less downscaling on your 2K data.

Jan 15 '25 11:01 ZhengPeng7

Hello,

First of all, thank you for your incredible work and contributions!

I want to train a model specifically for removing backgrounds from car images. I have a dataset of approximately 80,000 images. Could you guide me on the best practices to follow, which model and settings would be most suitable, and whether there are any tutorials available for training or fine-tuning a model?

Jan 16 '25 05:01 Roshan-digi5

I've made a guideline of fine-tuning in my README. For settings of fine-tuning, you can use the default settings except for the epochs. If you still have a problem after following it, plz tell me.

Jan 16 '25 14:01 ZhengPeng7

Thank you will let you know in case of any issue.

Jan 17 '25 04:01 Roshan-digi5

Hi,

Thank you so much for taking the time to reply!

I wanted to ask specifically about the configurations, losses and backbone you would recommend for my use case. Are there any particular hyperparameters or architectures you find especially suitable for this type of task? Any additional guidance would be greatly appreciated.

Thanks again for your support!

Jan 17 '25 09:01 alvarofsan

In my mind, car segmentation should have fewer contour details or the need for transparency. If so, you can train the model with fewer epochs and higher weights of IoU loss to accelerate the convergence. I may come up with more points in the future, but currently that's all.

Jan 17 '25 09:01 ZhengPeng7

Sorry for not specifying earlier, my use case is object segmentation on a plain background (not cars). Many objects do have transparencies and some small details like tiny holes.

Jan 17 '25 09:01 alvarofsan

That would be a general case. I'm not sure about it (otherwise, I would have added the updates to the default settings).

Jan 17 '25 09:01 ZhengPeng7

Thank you very much again! The model trained with DIS performs really well in most cases, but we have identified some corner cases where it fails. Would you recommend fine-tuning only with those specific cases where it fails (not the entire 7k, just the problematic ones) or fine-tuning the entire dataset instead?

How much VRAM should I need? I have read around 25GB with FP16?

Jan 21 '25 09:01 alvarofsan

If you find it works worse on some specific cases, training only on them would help a lot. Hard negative samples usually teach the model more about it.

Yeah, following the setting there with compile, FP16, and batch_size == 2, the training would take ~25GB.

Jan 21 '25 10:01 ZhengPeng7

I'm following the guidelines you created but still unable to understand i have updated my dataset paths in same way as there said till step2 after that what changes needs to be done in config.py as well as in train.py any more guidance or any Collab demo for fine-tunning

Feb 05 '25 11:02 Roshan-digi5

OK, thanks for the suggestion. I'll try to record a video of ~1 min to start basic fine-tuning.

Feb 05 '25 11:02 ZhengPeng7

Thank you!

Feb 06 '25 04:02 Roshan-digi5

Oh, by the way, now, with the latest codes, the standard BiRefNet can be trained on a single 4090; you can check the FP16 GPU mem part in README.

Feb 13 '25 08:02 ZhengPeng7

okay thank you

Feb 20 '25 06:02 Roshan-digi5

I've created a large dataset by combining data provided by you with my own collection. Now, I want to train a model that performs well not only on plain backgrounds but also on more detailed items such as jewelry and clothing. What would you recommend as the best approach to achieve strong performance across both these scenarios?

Apr 25 '25 10:04 Roshan-digi5

If it's a 0 or 1 segmentation task, use the correct loss in the setting. Increasing the IoU fine-tuning epochs may do some improvement on it. The most important thing is to make the training data as close as possible to your practical cases.

Apr 25 '25 10:04 ZhengPeng7

Hey guys, I've uploaded the tutorial on fine-tuning BiRefNet with custom data to my YouTube channel: https://youtu.be/FwGT_0V9E-k and Bilibili channel: https://www.bilibili.com/video/BV1dxEkzgE3J

Let me know if you still have questions after watching the screen recording.

May 15 '25 03:05 ZhengPeng7

Hi @ZhengPeng7, I have a dataset of 9600+ train and 2400+ validation images. Both of these includes original and ground truth. I recently watched your video on youtube and tried to train the BiRefNet-general-epoch_244.pth with my dataset I mentioned earlier. I did all the configurations you had done in the video and finally when I tried to run this command:

(birefnetenv) ubuntu@ip-123-4-56-789:~/Birefnet-Train$ ./train_test.sh ft-recording 0,1 0 -bash: ./train_test.sh: /bin/sh^M: bad interpreter: No such file or directory

I ran into a error as you can see. Can you point out what was wrong or is there any other way to start the training with train.py instead of train_test.sh. Thanks in advance!

May 20 '25 13:05 Jaga-2410

It's easy to solve. You can change the /bin/sh in the first line to /bin/bash and everything should be okay.

May 21 '25 03:05 ZhengPeng7

Thank you @ZhengPeng7. I'll try it and get back to you.

May 21 '25 04:05 Jaga-2410

Hello, i followed the tutorial trained the model for 200 epoch but no weights are saved in ckpt why ? my cmd was same as in your YT video and all other configurations as well.

Aug 08 '25 07:08 Roshan-digi5

Check again the fine-tuning guideline in the README. Did you really start the training? To make a fine-tuning for N epochs with pre-trained weights BiRefNet-xxx-epoch_x.pth, the epochs should be set to x + N.

Aug 08 '25 07:08 ZhengPeng7

i did like 244+200 and changed nothing else gave the path to "BiRefNet-general-epoch_244.pth" and training started and it goes like for 388 epoch since total would be around 444 epoch i should have some files in ckpt right ? all i have is a log.txt

which has "2025-08-08 05:42:10,883 INFO @==Final== Epoch[389/444] Training Loss: 5.4996 2025-08-08 05:42:13,773 INFO Epoch[390/444] Iter[0/9953]. Training Losses: bce: 0.12249 | iou: 0.02526 | ssim: 0.037339 | mae: 0.16675 | loss_pix: 1.4073 | 2025-08-08 05:43:55,220 INFO Epoch[390/444] Iter[100/9953]. Training Losses: bce: 0.13709 | iou: 0.024142 | ssim: 0.040086 | mae: |"

for every epoch but no ckpt is saved

Aug 18 '25 04:08 Roshan-digi5

Really weird.. Did you change the val_last and step in train.sh to a very large number? Pay attention to the settings there. Also, you can add a continue in the training for loop, or make a very small dataset to accelerate your validation on this problem.

Aug 18 '25 04:08 ZhengPeng7

okay ill try and let you know should change these number to 5 or something ?

Aug 18 '25 06:08 Roshan-digi5

@Roshan-digi5 late comment but were you able to achieve good results with in your car segmentation finetune... specifically background removal from windows of cars? if so would you mind sharing your fine-tune process

Aug 25 '25 16:08 jeph-the-goat