ingest-file icon indicating copy to clipboard operation
ingest-file copied to clipboard

Validate IBANs

Open tillprochaska opened this issue 2 years ago • 3 comments

ingest-file extracts IBANs using a rather simple regex. This can lead to a lot of false positives. ingest-file could add additional validation for matches in order to improve precision:

  • Validating the length depending on country
  • Validating checksums

We should consider that the text the extraction is performed on is often the result of OCR processing which may detect characters incorrectly. If an IBAN’s checksum isn’t correct, that may be due to OCR having misdetected a character etc.

tillprochaska avatar Jan 13 '23 12:01 tillprochaska

I would add:

  • a validation using the first two characters which stands for a country

Okssana avatar Jan 13 '23 13:01 Okssana

Perhaps as a first step we could check out how far we would get by using a library like https://pypi.org/project/schwifty/

stchris avatar Jan 19 '23 09:01 stchris

@Okssana what would greatly help here is a list of IBANs to test with, in either text or document form (PDFs, images). Would you be able to add some to this ticket if they come your way?

stchris avatar Jan 19 '23 09:01 stchris