Propose a unit test plan for hopenai
We want to get to 80% test coverage
- E.g., the cost mechanism is not tested
Use code coverage doc in the docs but also there is a tool we use @Shaunak01
- [x] Make proposal in a PR (what needs to be tested, how it needs to be tested)
@srinivassaitangudu You can read abt the coverage for helpers at: Coverage.md
You can run jackmd "coverage" to find all the coverage related docs, e.g., there is also docs/tools/all.invoke_workflows.how_to_guide.md
@srinivassaitangudu to make propose a plan
@gpsaggese I have few doubts in hopenai.py
- Why we are doing
isinstance()fortype_=="is_string"inconvert_to_type()?
def convert_to_type(col, type_):
if type_ == "is_bool":
return col.map(
lambda x: isinstance(x, bool)
or x in ["True", "False", "true", "false"]
or x in [1, 0, "1", "0"]
)
elif type_ == "is_int":
return pd.to_numeric(col, errors="coerce")
elif type_ == "is_numeric":
return pd.to_numeric(col, errors="coerce")
elif type_ == "is_string":
return col.map(lambda x: isinstance(x, str))
else:
raise ValueError(f"Unknown column type: {type_}")
- In
convert_df()we are not usingconvert_to_type(), Is there any reason?
def convert_df(
df: pd.DataFrame, *, print_invalid_values: bool = False
) -> pd.DataFrame:
types = df.apply(lambda x: pd.Series(infer_column_types(x)))
df_out = pd.DataFrame()
for col in df.columns:
if types[col]["type"] == "is_bool":
df_out[col] = df[col].astype(bool)
elif types[col]["type"] == "is_numeric":
df_out[col] = df[col].astype(float)
elif types[col]["type"] == "is_string":
df_out[col] = df[col]
else:
raise ValueError(f"Unknown column type: {types[col]['type']}")
return df_out
- in the
ifstatement, there should be a condition to check if the values are non-zero, if both values are zero, the type_ will be"is_bool"which we don't want.
def infer_column_types(col):
vals = {
"is_numeric": pd.to_numeric(col, errors="coerce").notna(),
#'is_datetime': pd.to_datetime(col, errors='coerce').notna(),
"is_bool": col.map(lambda x: isinstance(x, bool)),
"is_string": col.map(lambda x: isinstance(x, str)),
}
vals = {k: float(v.mean()) for k, v in vals.items()}
# type_ = np.where(vals["is_bool"] >= vals["is_numeric"], "is_bool",
# (vals["is_numeric"] >= vals["is_string"], "is_numeric",
# "is_string"))
if vals["is_bool"] >= vals["is_numeric"] :
type_ = "is_bool"
elif vals["is_numeric"] >= vals["is_string"]:
type_ = "is_numeric"
else:
type_ = "is_string"
vals["type"] = type_
return vals
@gpsaggese Please review the PR #777.
The functions convert_to_type, etc. are part of a different PR. In practice, in this PR I downloaded the data from openrouter but everything was a string and so I started writing code to solve this general problem of inferring types.
- That code should go in hpandas.py and you should put that in a different PR and improve it. I'm sure there are bugs and it's not tested. I'll file a separate issue for that