estimagic icon indicating copy to clipboard operation
estimagic copied to clipboard

Equality constraints involving params values not always caught correctly

Open hmgaudecker opened this issue 2 years ago • 11 comments

Bug description

The following constraint:

{
    'query': "category == 'transition' & name2 == 'x' & name1 == 'y' & not (-1e-10 < value < 1e-10)",
    'type': 'equality'
}

is not caught correctly because in line 62 of process_selectors.py, we have:

selected = evaluator(helper)

with helper (after applying the query):

                                    value  lower_bound
category   period name1      name2                    
transition 0      y          x      200    -inf
           1      y          x      216    -inf
           2      y          x      232    -inf
           3      y          x      248    -inf

That is, the value fields of helper contain the indices in internal_params (I suppose) instead of the value field of the original params DataFrame.

What I am actually trying to achieve is to exclude the rows with values 200 and 248 in the above DataFrame because they are fixed to zero. The corresponding constraints may change as I try different specifications, so I want to do so in a programmatic way -- I'd be curious to know if there were better options.

To Reproduce

I'd hope the above is clear enough, but I can try to distill an example.

Expected behavior

Rows values 200 and 248 should not appear in the result of evaluator(helper) because their value in the DataFrame passed to estimate_ml as the params argument is 0.0.

System

  • Version 0.4.3

Potential solution

Around line 40 of process_selectors, define helper as:

helper = params.copy()
helper["internal_index"] = tree_converter.params_unflatten(np.arange(n_params))["value"]

and adjust _get_selection_evaluator accordingly. This would work out-of-the box for field == "query" with an appropriate adjustment of the key (value -> internal_index in line 181). However, I do not see through what it would imply for other types of fields.

hmgaudecker avatar Feb 28 '23 05:02 hmgaudecker

Thanks for the detailed issue and debugging!

I think this is a general problem that occurs if a query or selector uses parameter values.

The reason why we replace params with the helper is that the selected helper values will be the indices of a flattened params vector.

Changing this in general is actually a non-trivial problem and it will take a while to fix this.

janosg avatar Feb 28 '23 09:02 janosg

Thanks, I understand that. But why would you require the "value" field in the helper and can't just use an arbitrary name?

hmgaudecker avatar Feb 28 '23 10:02 hmgaudecker

We use the "value" field because it avoids rewriting tree flattening rules. But we might be able to avoid it.

The whole problem would be trivial to fix if we treated your case of 1 params DataFrame as a special case. Since users can also use DataFrames deep inside nested dictionaries, our solution needs to be more general.

For your case, a simple workaround could be to evaluate your queries on the start params and replace them by locs. i.e.

new_constraints = []
for constr in constraints:
    if "query" in constr:
        new_constr = constr.copy()
        q = new_constr.pop("query")
        new_constr["loc"] = start_params.query(q).index
        new_constraints.append(new_constr)
    else:
        new_constraints.append(constr)

janosg avatar Mar 01 '23 09:03 janosg

Almost. This works well for the entire start_params, but helper only has the parameters that actually vary in the optimization problem. Hence, the index of parameters that are fixed does not exist and the estimation errors out with an IndexError.

I'll try to think through some further workaround...

hmgaudecker avatar Mar 01 '23 11:03 hmgaudecker

Yes, I would run this on your start params before you pass the constraints to estimagic.

janosg avatar Mar 01 '23 12:03 janosg

That is what I did. But different .loc-constraints converted this way fail because the corresponding index is not present anymore in helper.

hmgaudecker avatar Mar 01 '23 13:03 hmgaudecker

Put differently,

>>> len(start_params)
360
>>> len(helper)
80

A query just returns an empty object, a loc-constraint yields an IndexError.

hmgaudecker avatar Mar 01 '23 13:03 hmgaudecker

I am very surprised.

We define helper as:

n_params = len(tree_converter.params_flatten(params))
helper = tree_converter.params_unflatten(np.arange(n_params))

This is before any constraints handling is done, i.e. at this point we do not even know which parameters will be fixed later. helper should look exactly like params, except for replacing parameter values with integers ranging from 0 to n_params - 1.

janosg avatar Mar 01 '23 13:03 janosg

To explain a bit more: We always treat parameters as a general pytree that can be flattened and unflattened. Even if params is just one DataFrame.

A solution like:

helper = params.copy()
helper["internal_index"] = tree_converter.params_unflatten(np.arange(n_params))["value"]

Assumes that params is something that has a value column and can be assigned new columns. We cannot make this assumption in general.

janosg avatar Mar 01 '23 13:03 janosg

Thanks, I'll run it through the debugger again asap and will report back with more details. Probably my mistake somewhere.

Thinking of it: Most likely explanation is that this error occurs now during the generation of sensible starting values for a full skillmodels run, i.e. the reduction in parameters is because of that. So will have to do the conversion from query to loc before setting each different model to run!

hmgaudecker avatar Mar 01 '23 14:03 hmgaudecker

That was it.

I'd think the use case is sufficiently rare to not support it. The way to go would be to exclude if a regex like string_to_error_out here:

delimit = r"[ \n&|><=()]"
string_to_error_out = f"{delimit}value{delimit}"

shows up in a query.

Error message should include an explanation and your workaround, I am using this function now:

def convert_queries_to_locs(start_params, constraints):
    new_constraints = []
    for constr in constraints:
        if "query" in constr:
            new_constr = constr.copy()
            q = new_constr.pop("query")
            q_result = start_params.query(q)
            if not q_result.empty:
                new_constr["loc"] = q_result.index
                new_constraints.append(new_constr)
        else:
            new_constraints.append(constr)
    return new_constraints

hmgaudecker avatar Mar 02 '23 05:03 hmgaudecker

As you say, this is sufficiently rare that we don't need to support it. Moreover, it's made partially obsolete with #495 where we deprecate queries as a means to select parameters (they can still be used inside selectors of course, but estimagic won't know about that). Your fix will remain mostly unaffected.

The documentation of constraints will be rewritten completely in the process and I'll make sure that this case is explained clearly.

janosg avatar Jul 10 '24 09:07 janosg