csv-detective icon indicating copy to clipboard operation
csv-detective copied to clipboard

refactor: use frformat package

Open pierrecamilleri opened this issue 7 months ago • 0 comments

COPY of #83 without fork, trying to trigger CI tests.

Context

The library fr-format has been developed for sharing validation functions between validata and csv-detective, and to introduce a standard library to validate typical French formats.

The aim of this PR is to replace custom validation with the implementation of fr-format.

Refactorings

  • code postal et code commune Insee
  • code canton, numero departement, code region
  • code fantoir
  • canton, departement, commune, region, pays
  • Latitude_l93, Longitude_l93
  • code RNA

Behavior changes

  • After noticing code Fantoir prefixes starting with two letters, we slightly changed the regex to accept this format (eg. ZB12A).
  • CodeRNA does not allow 'w' as first letter, only uppercase 'W'.

Performance

Performance report ─=≡Σ((( つ•̀ω•́)つ

Testing table with 100000000 rows

                     without       with     fr-format

"8730"

code_postal          10.27 s       10.62 s
code_fantoir         10.22 s       10.66 s
code_commune         10.27 s       10.47 s

"ABCDE"

code_postal          12.32 s       11.40 s
code_fantoir         12.24 s       11.37 s
code_commune         11.73 s       11.33 s

"12345"

code_postal          11.63 s       11.38 s
code_fantoir         11.23 s       11.05 s
code_commune         11.31 s       10.97 s

The differences do not appear to be statistically significant, given the variability between the two executions observed.

Edit the 29 May 2024

pierrecamilleri avatar Jul 26 '24 15:07 pierrecamilleri