doubleml-for-py icon indicating copy to clipboard operation
doubleml-for-py copied to clipboard

Issue in setter of y_col properties for objects of class DoubleMLData: Objects mutate despite the fact that a ValueError was raised

Open MalteKurz opened this issue 2 years ago • 0 comments

Describe the bug

Bug reported by @ShreyDixit:

Assume that one successfully initializes an object of class DoubleMLData. Then alters a property like y_col in a way that violates some basic assumptions (e.g., the same variable cannot be at the same time the outcome variable y_col and the treatment variable d_cols). This results in a ValueError being raised. However, nevertheless the object mutates and violates the basic assumption.

--> So while the ValueError is appropriately raised, the object nevertheless mutates and the y_col property is changed. The root cause is in the setter for the y_col property https://github.com/DoubleML/doubleml-for-py/blob/0690cc65895feb73b9a338cabde290ee72cf0feb/doubleml/double_ml_data.py#L353-L365

Basically the value shouldn't be set before all checks have been successfully applied. However, in its current form the _check_disjoint_sets() check requires that the properties have been set already. The same issue also applies to the other setters for properties like d_cols, x_cols, etc. Note however, that this issue only becomes relevant if an object of class DoubleMLData has been initialized successfully and if then the user alters one of the properties in a way that violates _check_disjoint_sets().

Minimum reproducible code snippet

Code block 1

from doubleml.datasets import make_plr_CCDDHNR2018
dml_data = make_plr_CCDDHNR2018()
print(dml_data.y_col)
dml_data.y_col = 'd'

Code block 2

print(dml_data.y_col)

Expected Result

First code block: dml_data.y_col == 'y' and raise exception

ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.

Second code block: dml_data.y_col == 'y' should still hold.

Actual Result

First code block: dml_data.y_col == 'y' and raise exception

ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.

Second code block: dml_data.y_col == 'd'

Versions

Python 3.9.7 DoubleML 0.4.1 Scikit-Learn 1.0.1

MalteKurz avatar Mar 23 '22 09:03 MalteKurz