fast-plate-ocr icon indicating copy to clipboard operation
fast-plate-ocr copied to clipboard

Enhancing Model Robustness for Argentine License Plate OCR in Varied Lighting Conditions

Open yihong1120 opened this issue 1 year ago • 6 comments

Dear Maintainers,

I hope this message finds you well. I have been exploring the remarkable work done on the Argentine License Plate OCR repository and am thoroughly impressed with the system's performance, particularly given the constraints of embedded system deployment.

However, I have observed that the model's performance can be significantly impacted by varying lighting conditions, which is a common scenario in real-world applications. In low-light or overexposed environments, the accuracy of character recognition appears to be compromised.

Given the importance of reliable license plate recognition across all times of day and under diverse lighting, I believe enhancing the model's robustness in this aspect could greatly improve its utility. Here are a few suggestions that might help address this issue:

  1. Dynamic Range Adjustment: Implementing an algorithm to normalize the lighting of the input images could help the model perform consistently. This could involve techniques like histogram equalization or adaptive gamma correction.

  2. Lighting Augmentation in Training: To better prepare the model for different lighting scenarios, we could introduce a wider range of lighting conditions in the training data augmentation pipeline. This might include simulating underexposure, overexposure, and shadow effects.

  3. Dedicated Low-light Model: Training a specialized model on a dataset predominantly composed of low-light images might yield a more robust performance in such conditions. This model could either be used in tandem with the primary model or be triggered based on the detected lighting conditions.

  4. Inference-time Preprocessing: Incorporating a preprocessing step during inference to adjust the lighting of the input images could be another approach. This step would aim to bring the image closer to the model's "comfort zone."

I would be keen to contribute to this enhancement, whether through dataset curation, model training, or developing preprocessing algorithms. I believe that with a collaborative effort, we can achieve a more resilient OCR system that performs reliably in all lighting conditions.

Thank you for considering my suggestions. I look forward to your thoughts and any guidance on how I might assist in this endeavour.

Best regards, yihong1120

yihong1120 avatar Dec 18 '23 01:12 yihong1120

Hi,

Thanks for writing this, I think what you are proposing here is really cool. We could start with proposed idea (2) and fallback to the other ideas if doesn't satisfy us. I also had in mind some ideas for this repo:

  1. Look into using more modern architecture (if it's valuable)
  2. Making this a more language-universal OCR by training it with generated data i.e. this.
  3. Upgrade to Keras 3.0

Regarding the lightning conditions maybe we probably need a more significant dataset to validate our results. Maybe we can gather a dataset different from the Argentine plates or I can look into publishing one (will take me time).

Also, with what images/dataset of low-light plates did you try the OCR? if you can post some examples would be great!

ankandrew avatar Dec 18 '23 22:12 ankandrew

HI, everyone!! I'm impressed with the performance of this model. I want to contribute with a small dataset (~2K) that I extracted from mercadolibre. You Have a repo with the data?

biodatasciencearg avatar Feb 20 '24 13:02 biodatasciencearg

Hi @biodatasciencearg! That contribution would be welcomed :). I can publish a new release with that dataset. Perhaps you can upload it right here in the comments as a .zip file.

ankandrew avatar Feb 20 '24 20:02 ankandrew

Hello, I'm sending you a drive link because I can't upload the file here because it's too big! Let me know when you've downloaded it so I can delete it! https://drive.google.com/file/d/1zKXjo6i2m0xdLCti793VI2Dy0ewz_kkb/view?usp=drive_link

biodatasciencearg avatar Apr 15 '24 16:04 biodatasciencearg

Hi!

re @yihong1120: I've reworked the repo with some considerations aligned with what you mentioned. Related to your second point, I started using Albumentations, so much more augmentations can be now used out of the box. Some, but not all, are available in their demo https://demo.albumentations.ai/.

re @biodatasciencearg: Great, thanks for the contribution! I've also released the dataset I used to train the original model, see this. I downloaded your dataset, aligned it with the new format and uploaded it to releases https://github.com/ankandrew/fast-plate-ocr/releases/tag/arg-plates. However, I noticed performance on this is not great, so definitely should retrain with both datasets combined. Feel free to re-train, and I will upload it to the hub - otherwise I'll see if I have time and do it.

ankandrew avatar Apr 15 '24 18:04 ankandrew

Since I didn't include any "zoom out" augmentation and my train dataset plates were roughly cropped the same way, seems like that is the reason for bad performance in @biodatasciencearg dataset. I compared the same image, but just cropped (so it aligns more with the train dataset).

Image example: 826988-MLA54910508732_042023.jpg

Original image

Screenshot 2024-04-15 at 3 44 49 PM

Confidence: [0.11418319 0.2238257  0.06423701 0.13651179 0.12144833 0.13031216 0.6558619 ]

Cropped image

Screenshot 2024-04-15 at 3 44 19 PM

[0.78425103 0.74215186 0.672121   0.774952   0.7979243  0.78554845 0.8182486 ]

I guess some more augmentation can be introduced to make this more robust for different crops.

ankandrew avatar Apr 15 '24 18:04 ankandrew

Hi everyone! @ankandrew In fact when you have an distorsioned license plate the recognition would be difficult. I have even tried this service https://portal.vision.cognitive.azure.com/demo/extract-text-from-images and in very tilted license plate fail.

But to be honest You just need to get de ROIs with this implementation an it's. After that with this model works perfectly and with low latency! https://github.com/claudiojung/iwpod-net

biodatasciencearg avatar Apr 22 '24 15:04 biodatasciencearg

Hi everyone! @ankandrew In fact when you have an distorsioned license plate the recognition would be difficult. I have even tried this service https://portal.vision.cognitive.azure.com/demo/extract-text-from-images and in very tilted license plate fail.

But to be honest You just need to get de ROIs with this implementation an it's. After that with this model works perfectly and with low latency! https://github.com/claudiojung/iwpod-net

I agree, to get the best performance plates should be cropped properly (as was my original design choice). Closing this.

ankandrew avatar Apr 23 '24 00:04 ankandrew

I am considering the possibility of doing the data augmentation externally, which could potentially include GANs for generating license plates with lighting changes.

Therefore, I would like to ask:

Is it possible to turn off the data augmentation? Does the input resolution in the current solution have to be 140x70? Because in the provided dataset, I see that this is not the case, and they are even in RGB. Is the conversion to B&W happening in the augmentation step? Does the resolution of motorcycle images also get adjusted to the same resolution? Thank you very much! elias

biodatasciencearg avatar Sep 06 '24 17:09 biodatasciencearg

Hi!

Is it possible to turn off the data augmentation?

The current way to do it is to pass an empty augmentation pipeline, so it doesn't use the default one. See this.

Does the input resolution in the current solution have to be 140x70?

Nope, the size can be anyone. These numbers are derived from my original dataset stats.

Is the conversion to B&W happening in the augmentation step?

In the preprocess phase, but I should change this and make it configurable. I will modify this in code later.

Does the resolution of motorcycle images also get adjusted to the same resolution?

Yep, applies to all images.

ankandrew avatar Sep 08 '24 23:09 ankandrew