ingest-file
ingest-file copied to clipboard
Validate IBANs
ingest-file extracts IBANs using a rather simple regex. This can lead to a lot of false positives. ingest-file could add additional validation for matches in order to improve precision:
- Validating the length depending on country
- Validating checksums
- …
We should consider that the text the extraction is performed on is often the result of OCR processing which may detect characters incorrectly. If an IBAN’s checksum isn’t correct, that may be due to OCR having misdetected a character etc.
I would add:
- a validation using the first two characters which stands for a country
Perhaps as a first step we could check out how far we would get by using a library like https://pypi.org/project/schwifty/
@Okssana what would greatly help here is a list of IBANs to test with, in either text or document form (PDFs, images). Would you be able to add some to this ticket if they come your way?