bofire [Bug]: silent failure for model_type strings whose quadratic terms are not separated by {}'s

What happened?

Having non-deliminated-by {}'s quadratic terms just causes Formulaic to drop them. The hint i had that the preferred way of dealing with quadratics in Formulaic is via {}s was one line of one test, no comment. .

Request: do not just fail silently. Add a regex that either fixes the situation or error out.

That is, the following should all be equivalent: {x**2}, {x*x}, {x * x}, x**2, x*x, x * x, x:x
Currently, only these are equivalent {x**2}, {x*x}, {x * x}, {x : x}
and these are equivalently not seen and treated as null: x**2, x*x, x * x, x:x

Moreover, Formula() applied to the output of get_formula_from_string where model_type is a shortcut (e.g. "fully-quadratic") should be idempotent. Using the domain below, we have formula = get_formula_from_string(model_type="fully-quadratic", inputs=inputs) produces the string "1 + x1 + x2 + x3 + x1 ** 2 + x2 ** 2 + x3 ** 2 + x1:x2 + x1:x3 + x2:x3" and applying Formula() to this string yields "1 + x1 + x2 + x3 + x1:x2 + x1:x3 + x2:x3". get_formula_from_string should instead return "1 + x1 + x2 + x3 + {x1 ** 2} + {x2 ** 2} + {x3 ** 2} + x1:x2 + x1:x3 + x2:x3", which remains fixed under application of Formula() (desired behavior).

Behavior demonstration:

import os

from bofire.data_models.api import Domain, Inputs, Outputs
from bofire.data_models.features.api import (
    ContinuousInput,
    ContinuousOutput,
)
from bofire.strategies.doe.utils import get_formula_from_string

models = {
    "linear-and-quadratic": "linear-and-quadratic",
    "model_1": "1 + x1 + x2 + x3 + x1 ** 2 + x2 ** 2 + x3 ** 2",
    "model_1a": "1 + x1 + x2 + x3 + {x1 ** 2} + {x2 ** 2} + {x3 ** 2}",
    "model_2": "1 + x1 + x2 + x3 + x1*x1 + x2*x2 + x3*x3",
    "model_2a": "1 + x1 + x2 + x3 + {x1*x1} + {x2*x2} + {x3*x3}",
    "model_3": "1 + x1 + x2 + x3 + x1 * x1 + x2 * x2 + x3 * x3",
    "model_3a": "1 + x1 + x2 + x3 + {x1 * x1} + {x2 * x2} + {x3 * x3}",
}

for name, expr in models.items():
    formula = get_formula_from_string(model_type=expr, inputs=inputs)
    print(
        f"{name}: {expr}.\n\tAs a Formula: {formula}.\n\tExpected experiments: {len(formula) + 3}"
    )

Outputs:

Please provide a minimal, reproducible example of the unexpected behavior.

see above

Please paste any relevant traceback/logs produced by the example provided.

BoFire Version

0.2.1

Python Version

3.10.12

Operating System

WSL: Ubuntu-22.04

Jul 17 '25 15:07 rosonaeldred

len(Formula(model string)) is not used consistently everywhere, see https://github.com/experimental-design/bofire/blob/7fb596e0968e48c82bb4e64e30bbef9ed1477af4/bofire/strategies/doe/objective.py#L70


        self.formula = deepcopy(formula)
        self.model_terms_string_expression = convert_formula_to_string(
            domain=domain, formula=formula
        )
        self.n_model_terms = len(list(np.array(formula, dtype=str)))

Jul 24 '25 06:07 rosonaeldred

Also concerning https://github.com/experimental-design/bofire/blob/7fb596e0968e48c82bb4e64e30bbef9ed1477af4/bofire/strategies/doe/utils.py#L155-L184

Jul 28 '25 12:07 rosonaeldred

This is more complex than I can handle. Need an assist from @dlinzner-bcs @R-M-Lee @bertiqwerty e.g.

Aug 04 '25 14:08 rosonaeldred

Hi @rosonaeldred,

as you maybe noticed I don't really find time to work on bofire stuff at the moment anymore :/ But now I am sitting in the train for quite a while and I had time to look into this issue a bit. Here is what I can tell about this issue, hoping it helps:

Regarding the point about get_formula_from_string(): Here I do not see a problem since it returns not a string but a formula object, which indeed contains all intended terms and remains unchanged w.r.t. to repeated applications of Formula() (see code and output below). I think we do not have control over the string representation of Formula objects (since they are defined in formulaic). So we will have to will live with the fact that the wavy brackets will be lost in the string representations, the returned Formula objects are, however, still unchanged under repeated application of Formula().

import os

from bofire.data_models.api import Domain, Inputs, Outputs
from bofire.data_models.features.api import (
    ContinuousInput,
    ContinuousOutput,
)
from bofire.strategies.doe.utils import get_formula_from_string
from bofire.data_models.domain.api import Inputs
from formulaic import Formula

domain = Domain.from_lists(
    inputs=[
        ContinuousInput(
            key=f"x{i + 1}",
            bounds=(0, 1),
        )
        for i in range(3)
    ],
    outputs=[ContinuousOutput(key="y")],
)

models = {
    "linear-and-quadratic": "linear-and-quadratic",
    "model_1": "1 + x1 + x2 + x3 + x1 ** 2 + x2 ** 2 + x3 ** 2",
    "model_1a": "1 + x1 + x2 + x3 + {x1 ** 2} + {x2 ** 2} + {x3 ** 2}",
    "model_2": "1 + x1 + x2 + x3 + x1*x1 + x2*x2 + x3*x3",
    "model_2a": "1 + x1 + x2 + x3 + {x1*x1} + {x2*x2} + {x3*x3}",
    "model_3": "1 + x1 + x2 + x3 + x1 * x1 + x2 * x2 + x3 * x3",
    "model_3a": "1 + x1 + x2 + x3 + {x1 * x1} + {x2 * x2} + {x3 * x3}",
}

for name, expr in models.items():
    formula = get_formula_from_string(model_type=expr, inputs=domain.inputs)
    print(name)
    print("expression:        ", expr)
    print("formula:           ", formula, ", ", list(formula))
    print("Formular(formula): ", Formula(formula), ", ", list(Formula(formula)))
    print()

output:

linear-and-quadratic
expression:         linear-and-quadratic
formula:            1 + x1 + x2 + x3 + x1 ** 2 + x2 ** 2 + x3 ** 2 ,  [1, x1, x2, x3, x1 ** 2, x2 ** 2, x3 ** 2]
Formular(formula):  1 + x1 + x2 + x3 + x1 ** 2 + x2 ** 2 + x3 ** 2 ,  [1, x1, x2, x3, x1 ** 2, x2 ** 2, x3 ** 2]

model_1
expression:         1 + x1 + x2 + x3 + x1 ** 2 + x2 ** 2 + x3 ** 2
formula:            1 + x1 + x2 + x3 ,  [1, x1, x2, x3]
Formular(formula):  1 + x1 + x2 + x3 ,  [1, x1, x2, x3]

model_1a
expression:         1 + x1 + x2 + x3 + {x1 ** 2} + {x2 ** 2} + {x3 ** 2}
formula:            1 + x1 + x2 + x3 + x1 ** 2 + x2 ** 2 + x3 ** 2 ,  [1, x1, x2, x3, x1 ** 2, x2 ** 2, x3 ** 2]
Formular(formula):  1 + x1 + x2 + x3 + x1 ** 2 + x2 ** 2 + x3 ** 2 ,  [1, x1, x2, x3, x1 ** 2, x2 ** 2, x3 ** 2]

model_2
expression:         1 + x1 + x2 + x3 + x1*x1 + x2*x2 + x3*x3
formula:            1 + x1 + x2 + x3 ,  [1, x1, x2, x3]
Formular(formula):  1 + x1 + x2 + x3 ,  [1, x1, x2, x3]

model_2a
expression:         1 + x1 + x2 + x3 + {x1*x1} + {x2*x2} + {x3*x3}
formula:            1 + x1 + x2 + x3 + x1 * x1 + x2 * x2 + x3 * x3 ,  [1, x1, x2, x3, x1 * x1, x2 * x2, x3 * x3]
Formular(formula):  1 + x1 + x2 + x3 + x1 * x1 + x2 * x2 + x3 * x3 ,  [1, x1, x2, x3, x1 * x1, x2 * x2, x3 * x3]

model_3
expression:         1 + x1 + x2 + x3 + x1 * x1 + x2 * x2 + x3 * x3
formula:            1 + x1 + x2 + x3 ,  [1, x1, x2, x3]
Formular(formula):  1 + x1 + x2 + x3 ,  [1, x1, x2, x3]

The parsing works according to the Formula grammar (https://matthewwardrop.github.io/formulaic/latest/guides/grammar/) described in the formulaic documentation.

I think it will be a lot of work to adapt this grammar (i.e. "fix" formula expressions automatically) while still correctly parsing the most general model formulas with formulaic. But indeed the grammar can be counter-intuitive. Therefore I would think it is better to take the second approach that you suggested and throw a warning (better not an error imo) if we suspect that the parsing in formulaic will lead to a formula that doesn't correspond to the one intended by the user.

This could be implemented like this: We throw a warning based on the graph that is built in the background by formulaic. In detail, an warning (telling the user to double check if they adhere to the formulaic grammar) is shown, iff:

An ASTNode has an operator which is not :,*,+, or - (catching loss of terms with power functions such as x1 ** 2)
An ASTNode with the operator * or : has two equal arguments (catching loss of terms like x1*x1) We do not catch cases with higher order terms like x1*x2*x1 (which is equivalent to x1*x2 instead of {x1*x2*x2}) with this method. But we could extend 2. by throwing a warning if an argument occurs twice when you follow the child nodes until the end of the tree. However this will lead to unnecessary warnings in some cases (e.g. x1*(0*x1 + x2)).

I do not know enough about the internal mechanisms of formulaic to foresee if this will lead to unnecessary warnings in additional, a bit more exotic cases. But for sure it should warn the user in the cases you mentioned above.

We can add a comment with a reference to the Formulaic grammar in the documentation so that everyone, in principle, knows what they have to do.

What do you think about this? I think I can implement it somewhen soon, if you or @dlinzner-bcs etc agree.

Cheers, Aaron :)

Aug 04 '25 22:08 Osburg

bofire bofire copied to clipboard

[Bug]: silent failure for model_type strings whose quadratic terms are not separated by {}'s

What happened?

Please provide a minimal, reproducible example of the unexpected behavior.

Please paste any relevant traceback/logs produced by the example provided.

BoFire Version

Python Version

Operating System

bofire
bofire copied to clipboard