helpers icon indicating copy to clipboard operation
helpers copied to clipboard

Propose a unit test plan for hopenai

Open gpsaggese opened this issue 7 months ago • 6 comments

We want to get to 80% test coverage

  • E.g., the cost mechanism is not tested

Use code coverage doc in the docs but also there is a tool we use @Shaunak01

  • [x] Make proposal in a PR (what needs to be tested, how it needs to be tested)

gpsaggese avatar May 21 '25 13:05 gpsaggese

@srinivassaitangudu You can read abt the coverage for helpers at: Coverage.md

Shaunak01 avatar May 21 '25 14:05 Shaunak01

You can run jackmd "coverage" to find all the coverage related docs, e.g., there is also docs/tools/all.invoke_workflows.how_to_guide.md

gpsaggese avatar May 24 '25 21:05 gpsaggese

@srinivassaitangudu to make propose a plan

gpsaggese avatar May 28 '25 13:05 gpsaggese

@gpsaggese I have few doubts in hopenai.py

  1. Why we are doing isinstance() for type_=="is_string" in convert_to_type()?
def convert_to_type(col, type_):
    if type_ == "is_bool":
        return col.map(
            lambda x: isinstance(x, bool)
            or x in ["True", "False", "true", "false"]
            or x in [1, 0, "1", "0"]
        )
    elif type_ == "is_int":
        return pd.to_numeric(col, errors="coerce")
    elif type_ == "is_numeric":
        return pd.to_numeric(col, errors="coerce")
    elif type_ == "is_string":
        return col.map(lambda x: isinstance(x, str))
    else:
        raise ValueError(f"Unknown column type: {type_}")
  1. In convert_df() we are not using convert_to_type(), Is there any reason?
def convert_df(
    df: pd.DataFrame, *, print_invalid_values: bool = False
) -> pd.DataFrame:
    types = df.apply(lambda x: pd.Series(infer_column_types(x)))
    df_out =  pd.DataFrame()
    for col in df.columns:
        if types[col]["type"] == "is_bool":
            df_out[col] = df[col].astype(bool)
        elif types[col]["type"] == "is_numeric":
            df_out[col] = df[col].astype(float)
        elif types[col]["type"] == "is_string":
            df_out[col] = df[col]
        else:
            raise ValueError(f"Unknown column type: {types[col]['type']}")
    return df_out
  1. in the if statement, there should be a condition to check if the values are non-zero, if both values are zero, the type_ will be "is_bool" which we don't want.
def infer_column_types(col):
    vals = {
        "is_numeric": pd.to_numeric(col, errors="coerce").notna(),
        #'is_datetime': pd.to_datetime(col, errors='coerce').notna(),
        "is_bool": col.map(lambda x: isinstance(x, bool)),
        "is_string": col.map(lambda x: isinstance(x, str)),
    }
    vals = {k: float(v.mean()) for k, v in vals.items()}
    # type_ = np.where(vals["is_bool"] >= vals["is_numeric"], "is_bool",
    #                  (vals["is_numeric"] >= vals["is_string"], "is_numeric",
    #                  "is_string"))
    if vals["is_bool"] >= vals["is_numeric"] :
        type_ = "is_bool"
    elif vals["is_numeric"] >= vals["is_string"]:
        type_ = "is_numeric"
    else:
        type_ = "is_string"
    vals["type"] = type_
    return vals

srinivassaitangudu avatar May 29 '25 14:05 srinivassaitangudu

@gpsaggese Please review the PR #777.

srinivassaitangudu avatar May 29 '25 18:05 srinivassaitangudu

The functions convert_to_type, etc. are part of a different PR. In practice, in this PR I downloaded the data from openrouter but everything was a string and so I started writing code to solve this general problem of inferring types.

  • That code should go in hpandas.py and you should put that in a different PR and improve it. I'm sure there are bugs and it's not tested. I'll file a separate issue for that

gpsaggese avatar May 30 '25 13:05 gpsaggese