foreshadow icon indicating copy to clipboard operation
foreshadow copied to clipboard

The calculation in metrics.regex_rows() is not consistent with the documentation

Open jichaoz opened this issue 4 years ago • 1 comments

https://github.com/georgianpartners/foreshadow/blob/c2c213e0009cfdcf0aa9df75f0a6cf4c983d7090/foreshadow/metrics.py#L184

Here, before the sum, we should get a 0 or 1 value for each row. But instead, we are getting the matched length for each row, which leads to a final score larger than 1. Here are the code the reproduce the issue:

import pandas as pd
from foreshadow.concrete import DollarFinancialCleaner

x = pd.DataFrame({'price': ['$3', '$5.0', '$5,000.00']})
financial_cleaner = DollarFinancialCleaner()
metric = financial_cleaner.metric_score(x)
print(metric)

The expected value is 1 but get 4.2 instead.

jichaoz avatar Sep 23 '19 16:09 jichaoz