doubleml-for-py
doubleml-for-py copied to clipboard
Issue in setter of y_col properties for objects of class DoubleMLData: Objects mutate despite the fact that a ValueError was raised
Describe the bug
Bug reported by @ShreyDixit:
Assume that one successfully initializes an object of class DoubleMLData
. Then alters a property like y_col
in a way that violates some basic assumptions (e.g., the same variable cannot be at the same time the outcome variable y_col
and the treatment variable d_cols
). This results in a ValueError being raised. However, nevertheless the object mutates and violates the basic assumption.
--> So while the ValueError is appropriately raised, the object nevertheless mutates and the y_col
property is changed. The root cause is in the setter for the y_col
property https://github.com/DoubleML/doubleml-for-py/blob/0690cc65895feb73b9a338cabde290ee72cf0feb/doubleml/double_ml_data.py#L353-L365
Basically the value shouldn't be set before all checks have been successfully applied. However, in its current form the _check_disjoint_sets()
check requires that the properties have been set already. The same issue also applies to the other setters for properties like d_cols
, x_cols
, etc. Note however, that this issue only becomes relevant if an object of class DoubleMLData
has been initialized successfully and if then the user alters one of the properties in a way that violates _check_disjoint_sets()
.
Minimum reproducible code snippet
Code block 1
from doubleml.datasets import make_plr_CCDDHNR2018
dml_data = make_plr_CCDDHNR2018()
print(dml_data.y_col)
dml_data.y_col = 'd'
Code block 2
print(dml_data.y_col)
Expected Result
First code block: dml_data.y_col == 'y'
and raise exception
ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.
Second code block: dml_data.y_col == 'y'
should still hold.
Actual Result
First code block: dml_data.y_col == 'y'
and raise exception
ValueError: d cannot be set as outcome variable ``y_col`` and treatment variable in ``d_cols``.
Second code block: dml_data.y_col == 'd'
Versions
Python 3.9.7 DoubleML 0.4.1 Scikit-Learn 1.0.1